The music industry is experiencing the impact of artificial intelligence as Meta introduces its open-source music generation AI model, similar to ChatGPT for text generation. Felix Kreuk, an AI research engineer at Meta, recently showcased the abilities of “MusicGen” in a captivating Twitter thread. This innovative system can take existing music and transform it, allowing for intriguing adaptations like crafting an ’80s pop song from a timeless musical motif. The possibilities for musical exploration and creativity seem boundless with this remarkable technology.
According to Kreuk, the MusicGen model employs an EnCodec audio tokenizer based on a transformer language model. Users can try out the MusicGen demo through Hugging Face’s API, but the generation process might take longer if there are many concurrent users. Alternatively, users can set up their own instance of the model on the Hugging Face site for faster outputs. For those with technical expertise and sufficient computational resources, it is also possible to download the code and run it locally.
During testing, a synth-heavy “symphonic rendition of the happy birthday theme” and a lo-fi hip-hop track featuring nature samples (including crickets) were created. By default, the songs do not include lyrics. Gizmodo also tested the system by incorporating optional audio tracks with lyrics. In one instance, lyrics were added to a prompt for a “grunge song with heavy bass and violin accompaniment,” resulting in a crackly output compared to the same prompt without lyrics.
The extent to which the AI comprehends specific composers remains unclear. When prompted to generate a “Hans Zimmer score for a steampunk medieval film,” it is difficult to ascertain if the AI can truly replicate Zimmer’s distinctive themes and style.
While various AI models have been developed for text generation, voice synthesis, art generation, and short videos, there have been limited high-quality examples of music generation made available to the public. According to the research document accompanying the model, one of the main challenges in music generation is the need to capture the full frequency spectrum, which requires intensive sampling. Additionally, the complex structures and overlapping instrumentation found in music further contribute to the difficulty of generating high-quality musical compositions.
Meta has also compared its MusicGen system to Google’s MusicLM text-to-music model, providing a dedicated page showcasing the features of both models for direct comparison.
Concerning the training data used for MusicGen, artists may find it particularly noteworthy. The research paper states that the model was trained on 20,000 hours of licensed music, comprising a dataset of 10,000 music tracks from Meta’s internal collection. Furthermore, the company utilized around 390,000 instrument-only tracks sourced from Shutterstock and Pond5. The researchers at Meta assert that all the music used for training their model was covered by legal agreements with the respective rights holders, including a partnership with Shutterstock.
It is worth noting that Shutterstock previously entered into an agreement with OpenAI, the creator of DALL-E, and it already possesses its own AI image generation tool pre-trained on images contributed by its users. However, some artists have expressed concerns about their work being used to train AI models without their explicit permission. Lawsuits have been filed against prominent AI art companies such as Stability AI and Midjourney, with allegations specifically targeting the unauthorized use of licensed content in AI datasets. The situation becomes more complex when large tech companies like Meta have the financial resources to license creative content for use in their AI generation. As a user, there is a risk that the AI system may inadvertently or intentionally produce compositions that plagiarize the work of other musicians, whether licensed or not, which remains a lingering concern.