Meet AudioCraft: Meta’s new generative AI tool for audio and music

Photo credit: Andrea De Santis

Meta has unveiled a new generative artificial intelligence tool to generate music from text prompts and announced its open-source availability for research purposes.

The new tool called AudioCraft consists of three models: MusicGen, AudioGen and EnCodec.

Meta in June said that it used 20,000 hours of licensed music to train MusicGen, which included 10,000 “high-quality” licensed music tracks. At the time, Meta’s researchers outlined in a paper the ethical challenges that they encountered around the development of generative AI models like MusicGen.

Most recently, the Facebook and Instagram parent said that while MusicGen was trained with Meta-owned and specifically licensed music, AudioGen was trained on public sound effects to generate audio from text prompts.

The company also updated its EnCodec decoder, which allows “higher quality music generation with fewer artifacts.”

Meta is also rolling out its pre-trained AudioGen models, enabling users to generate various environmental sounds and sound effects, such as dogs barking, cars honking, or footsteps on wooden floors.

“We’re open-sourcing these models, giving researchers and practitioners access so they can train their own models with their own datasets for the first time, and help advance the field of AI-generated audio and music.”

Meta

“We’re open-sourcing these models, giving researchers and practitioners access so they can train their own models with their own datasets for the first time, and help advance the field of AI-generated audio and music,” said Meta.

Meta added that it simplified the overall design of generative models for audio versus prior work in the field, providing users “the full recipe to play with the existing models” that Meta has been developing over the past years.

The new tools seek to address the challenges in the field of audio generation in AI compared with models designed for images, video and text. The complexity of modeling audio signals and patterns at varying scales, especially for music, has been challenging, Meta acknowledged.

“Music is arguably the most challenging type of audio to generate as it’s composed of local and long-range patterns, from a suite of notes to a global musical structure with multiple instruments.”

Meta

“Music is arguably the most challenging type of audio to generate as it’s composed of local and long-range patterns, from a suite of notes to a global musical structure with multiple instruments,” the company said.

With AudioCraft, Meta says it allows musicians and creators to gain inspiration, brainstorm, and iterate on compositions in innovative ways with its user-friendly interface.

Meta envisions that MusicGen could potentially evolve into a new type of instrument, akin to the impact synthesizers had when they first emerged.

Meta’s AudioCraft is expected to rival Google’s MusicLM, a tool that can also generate high-fidelity music from text prompts and humming, which was made publicly available in May.

Music Business Worldwide