
Nvidia launched a brand new synthetic intelligence (AI) mannequin on Monday that may generate a wide range of audio and blend various kinds of sounds. The tech large calls the muse mannequin Fugatto, which is brief for Foundational Generative Audio Transformer Opus 1. Whereas audio-focused AI platforms resembling Beatoven and Suno exist, the corporate highlighted that Fugatto presents customers granular management over the specified output. The AI mannequin can generate or remodel any mixture of music, voices and sound primarily based on particular prompts.
Nvidia Introduces AI Audio Mannequin Fugatto
In a blog post, the tech large detailed its new massive language mannequin (LLM). Nvidia stated Fugatto can generate music snippets, take away or add devices from an present track, change accent or emotion in a voice, and “even let individuals produce sounds by no means heard earlier than.”
The AI model accepts each textual content and audio recordsdata as enter, and customers can mix each to fine-tune their requests. Below the hood, the muse mannequin’s structure relies on the corporate’s earlier work in speech modelling, audio vocoding, and audio understanding. Its full model makes use of 2.5 billion parameters and was skilled on the datasets of Nvidia DGX techniques.
Nvidia highlighted that the crew that constructed Fugatto collaborated from completely different nations globally together with Brazil, China, India, Jordan, and South Korea. The collaboration of individuals from completely different ethnicities has additionally contributed to creating the AI mannequin’s multi-accent and multilingual capabilities, the corporate stated.
Coming to the AI audio mannequin’s capabilities, the tech large highlighted that it has the aptitude to generate audio output sorts that it was not pre-trained on. Highlighting an instance, Nvidia stated, “Fugatto could make a trumpet bark or a saxophone meow. No matter customers can describe, the mannequin can create.”
Moreover, Fugatto can mix particular audio capabilities utilizing a way known as ComposableART. With this, customers can ask the AI mannequin to generate an audio of an individual talking French with a tragic feeling. Customers can even management the diploma of sorrow and the heaviness of the accent with particular directions.
Additional, the muse mannequin can even generate audio with temporal interpolation, or sounds that change over time. As an illustration, customers can generate the sound of a rainstorm with crescendos of thunder that fade into the space. These soundscapes may also be experimented with, and even when it’s a sound that the mannequin has by no means processed earlier than, it may create them.
At current, the corporate has not shared any plans to make the AI mannequin obtainable to customers or enterprises.
Catch the newest from the Shopper Electronics Present on Devices 360, at our CES 2025 hub.