VoCo allows you to change words in a voice-over simply by typing new words.
The premise is fairly simple, with a sufficient sample of a voice, you can get that voice to say anything you wish by just typing it in. Over time, Adobe will likely improve by reducing the required sample size and by making the outcome sound even more realistic, to perhaps the point where it is indistinguishable to your ear from natural speech.
VoCo has the potential to greatly reduce the need for voice actors. Take for example the person who does the promos for your local TV news station. Each night, the same voice does a 30 second tease of the stories which with air during their broadcast. Each day, that person is recording based upon script they are given. With Adobe VoCo, that same TV network may just pay a for a licence to access a package of voice samples. Then they just feed their script into VoCo and instantly they have the desired result. The human-expense will no longer be necessary.
The same premise will be true for a wide range of voice-related tasks where a script is involved. Many fields which have not yet been automated may take that path once artificial speech can sound as natural as a real person.
Making a voice say whatever you want is going to lead to a steep rise in fake recordings getting massive attention. Every famous person who has ever been recorded at sufficient length will end up as a potential voice for use in this software. Due to the relatively low skill requirement, anyone who can pick a voice sample and type something for it to say will be able to generate a recording.
In the video, the Adobe presenter mentioned the ability for them to create a virtual watermark of sorts to differentiate real from fake voices. However, if Photoshop has taught us anything, it is that something doesn’t need to be real to go viral. There are going to be many complicated legal and societal implications that come with the ease of making someone say whatever you want.
Some of it will be harmless fun, but VoCo will be frightening as well. For every home movie where someone has David Attenborough narrating their puppy’s first time in the snow, there will be a political attack ad featuring the opposition saying something scary that they never really said. People may understand that it isn’t real when told after the fact, however you can’t underestimate the psychological effect of what hearing certain things could mean to an election.
Just imagine the combination of natural audio over these facial manipulations:
Photoshop led to an era where you can no longer believe your eyes. Now you may not be able to trust your ears either.