Generate high-fidelity music from text descriptions

The model is capable of generating music that remains consistent over several minutes and operates on a hierarchical sequence-to-sequence modeling task. The generated music is at 24 kHz and outperforms prior systems in terms of audio quality and consistency with the text description.