Recommendable! I don't think it is the speed that is interesting here. It is actually quite slow needing 0.5 seconds to generate 1 second of sound?
However, "... a highly efficient, AI text-to-speech (TTS) system that can be hosted in real time using regular processors.", while "Most modern AI TTS systems require graphics cards, field-programmable gate arrays (FPGAs), or custom-designed AI chips like Google’s tensor processing units (TPUs) to run, train, or both. For instance, a recently detailed Google AI system was trained across 32 TPUs in parallel."
What else: "With the help of a tool called PyTorch JIT, Facebook engineers migrated from a training-oriented setup in PyTorch, Facebook’s machine learning framework, to a heavily inference-optimized environment. Compiled operators and tensor-level optimizations, including operator fusion and custom operators with approximations for the activation function (mathematical equations that determine the output of a model), led to additional performance gains."
And they used a bunch of other optimization tricks to speed up performance.
Facebook's voice synthesis AI generates speech in 500 milliseconds | VentureBeat
No comments:
Post a Comment