Spread the love

This AI is capable of generating sound for silent videos: music, background noise and even voices

© Lemon squeezer

Google is one of the world leaders in the field of AI and today it proves it again by presenting a new model called V2A or “Video-to-audio”. As the name given to this model suggests, it is a technology that can take a video as input, and produce audio that matches that video. The user can also give specific instructions to the AI, via prompts expressed in natural language, in order to influence the audio generation process.

Today, there are already several AI models capable of generating video content. OpenAI, for example, made a very strong impression by unveiling its Sora model. Google, for its part, has developed a similar technology, called Veo. However, these models generate silent videos. And Google's idea, with V2A, is to offer a second model which, combined with these technologies, will make it possible to automatically generate videos with sound. “It can also generate soundtracks for a range of traditional footage, including archival material, silent films and more, opening up a wider range of ;creative opportunities”, also indicates Deepmind, the branch specializing in AI of Google.

As Google's examples show, V2A can produce thrilling music for a horror movie scene, generate background noise for an underwater video, or generate drum sounds for a concert video. The firm also explains that this AI can generate an unlimited number of sounds for a video, but it is possible to refine the results by using prompts.

Subscribe to Presse-citron

V2A is not yet very good for vocals

In addition to music and background noises, Google's new AI can even generate voices, as shown in the video below. However, Google admits that its model still has difficulty synchronizing dialogue with videos. “V2A attempts to generate speech from input transcripts and synchronize it with movements of the characters' lips. But the coupled video generation model cannot be conditioned on transcriptions. This creates a lag, often resulting in strange lip sync, as the video model does not generate mouth movements corresponding to the transcription”, reads the firm's presentation.< /p>

In any case, Google believes that V2A stands out from other models existing audio generation: AI is capable of understanding “raw pixels” and text prompts are just one option. Otherwise, for how this AI was developed, Google explains that it trained the model with videos, audio, and annotations, so that V2A understands which sounds correspond to a given visual event.< /p>

Regarding the availability of this technology, the firm explains that it will first carry out evaluations and tests, before considering making V2A accessible to the public.

  • Google has just presented a new AI model called V2A or “Video-to-audio”
  • This is able to produce synchronized sounds for a video and the user can give specific instructions via a prompt
  • V2A can even produce voices, but still has problems synchronizing with lip movements

📍 So you don't miss any news from Presse-citron, follow us on Google News and WhatsApp.

159.0 M reviews

[ ]

Teilor Stone

By Teilor Stone

Teilor Stone has been a reporter on the news desk since 2013. Before that she wrote about young adolescence and family dynamics for Styles and was the legal affairs correspondent for the Metro desk. Before joining Thesaxon , Teilor Stone worked as a staff writer at the Village Voice and a freelancer for Newsday, The Wall Street Journal, GQ and Mirabella. To get in touch, contact me through my teilor@nizhtimes.com 1-800-268-7116