© Lemon squeezer
Google is one of the world leaders in the field of AI and today it proves it again by presenting a new model called V2A or “Video-to-audio”. As the name given to this model suggests, it is a technology that can take a video as input, and produce audio that matches that video. The user can also give specific instructions to the AI, via prompts expressed in natural language, in order to influence the audio generation process.
Today, there are already several AI models capable of generating video content. OpenAI, for example, made a very strong impression by unveiling its Sora model. Google, for its part, has developed a similar technology, called Veo. However, these models generate silent videos. And Google's idea, with V2A, is to offer a second model which, combined with these technologies, will make it possible to automatically generate videos with sound. “It can also generate soundtracks for a range of traditional footage, including archival material, silent films and more, opening up a wider range of ;creative opportunities”, also indicates Deepmind, the branch specializing in AI of Google.
As the examples provided by Google show, V2A can produce thrilling music for a horror movie scene, generate background noise for an underwater video, or even generate a drum sound on a video the concert. The firm also explains that this AI can generate an unlimited number of sounds for a video, but it is possible to refine the results using prompts.
Subscribe to Lemon Squeezer
In addition to music and background noises, Google's new AI can even generate voices, as shown in the video below. However, Google admits that its model still has difficulty synchronizing dialogue with videos. “V2A attempts to generate speech from input transcripts and synchronize it with movements of the characters' lips. But the coupled video generation model cannot be conditioned on transcriptions. This creates a lag, often resulting in strange lip sync, as the video model does not generate mouth movements corresponding to the transcription”, reads the firm's presentation.< /p>
In any case, Google believes that V2A stands out from other models existing audio generation: AI is capable of understanding “raw pixels” and text prompts are just one option. Otherwise, for how this AI was developed, Google explains that it trained the model with videos, audio, and annotations, so that V2A understands which sounds correspond to a given visual event.< /p>
Regarding the availability of this technology, the firm explains that it will first carry out evaluations and tests, before considering making V2A accessible to the public.
📍 So you don't miss any news from Presse-citron, follow us on Google News and WhatsApp.
159.0 M reviews
[ ]
Before his PSG match against Brest, young Bradley Barcola attracts praise from the media who…
Before his PSG match against Brest, young Bradley Barcola attracts praise from the media who…
© Warner Bros After two particularly successful feature films, Stephen King's It Saga will be…
© Renault It’s always interesting to know where products that we can use on a…
The fire that broke out in the massif of Aspres Thursday, is now fixed. On…
À the origin of the The Mazan rape case, Dominique Pélicot had first attracted attention…