Artificial intelligence developer Stability AI has unveiled Stable Audio 2.0, the next iteration of its text-to-music generation system.
The latest version helps artists and musicians with a wider range of creative tools and the ability to produce full-length music tracks “with traditional song structure and high audio quality” using natural language prompts, the company said Wednesday (April 3).
Stable Audio 1.0, released last September, captured attention with its ability to craft short audio clips based on textual descriptions. It was named one of TIME’s Best Inventions of 2023.
The new version expands on this foundation, allowing users to generate complete songs up to three minutes long at 44.1 kHz stereo. This extended timeframe opens doors for a wider variety of musical creations, from full instrumentals to structured compositions with intros, development sections, and outros.
“Stable Audio 2.0 sets a new standard in AI-generated audio,” Stability AI said in a blog post. “The new model introduces audio-to-audio generation by allowing users to upload and transform samples using natural language prompts.
Beyond the increased length, Stable Audio 2.0 also offers other features including new “audio-to-audio” capabilities that allow users to upload their own audio samples to set the style and sound of AI-generated outputs.
“With both text-to-audio and audio-to-audio prompting, users can produce melodies, backing tracks, stems, and sound effects, thus enhancing the creative process.”
Stability AI
“Our most advanced audio model yet expands the creative toolkit for artists and musicians with its new functionalities. With both text-to-audio and audio-to-audio prompting, users can produce melodies, backing tracks, stems, and sound effects, thus enhancing the creative process,” Stability AI said.
The release of Stable Audio 2.0 comes amid a period of internal change at Stability AI. Ed Newton-Rex, the company’s former Vice President of Audio, recently departed due to disagreements over the use of copyrighted materials in training datasets.
“Companies worth billions of dollars are, without permission, training generative AI models on creators’ works, which are then being used to create new content that in many cases can compete with the original works. I don’t see how this can be acceptable in a society that has set up the economics of the creative arts such that creators rely on copyright,” Newton-Rex, who helped develop Stable Audio, said in a public resignation letter. He has since launched an initiative to evaluate and certify AI models based on their respect for creators’ rights.
Stability AI addressed copyright concerns about its AI development, saying “Stable Audio 2.0 was exclusively trained on a licensed dataset from the AudioSparx music library, honoring opt-out requests and ensuring fair compensation for creators.”
The 1.0 model was also trained using data from AudioSparx, which consists of over 800,000 audio files containing music, sound effects, and single-instrument stems, and corresponding text metadata.
“Stable Audio 2.0 is one of the most powerful and flexible generative AI music tools available and makes it possible for musicians, producers, and other creators to use AI as a collaborative tool for music composition, audio experimentation, and content creation — like never before.”
Stability AI
The update also integrated Audible Magic to scan audio uploads for copyright infringement. Audible Magic offers content recognition technology to help with real-time content matching to prevent copyright infringement.
Stable Audio 2.0 also introduces features like Style Transfer to match generated or uploaded audio to existing tracks, SFX creation, and variations.
“Stable Audio 2.0 is one of the most powerful and flexible generative AI music tools available and makes it possible for musicians, producers, and other creators to use AI as a collaborative tool for music composition, audio experimentation, and content creation — like never before,” Stability AI said in a statement.
Stability AI also offers technical details about the model’s architecture, explaining its effectiveness in generating high-quality musical compositions.
“A new, highly compressed autoencoder compresses raw audio waveforms into much shorter representations. For the diffusion model, we employ a diffusion transformer (DiT), akin to that used in Stable Diffusion 3, in place of the previous U-Net, as it is more adept at manipulating data over long sequences.
“The combination of these two elements results in a model capable of recognizing and reproducing the large-scale structures that are essential for high-quality musical compositions.”
The new model is available to use for free on the Stable Audio website and will soon be available on the Stable Audio API.
Stability AI has also launched Stable Radio, a 24/7 live stream that features tracks generated by Stable Audio.
Music Business Worldwide