Many see AI voices as the future of voice production, as they promise an efficient (and therefore cheaper) alternative to working with real speakers.
At audioberlin, we have been extensively involved with AI voices from the very beginning – from working on/recording text-to-speech systems to testing + applying a wide variety of systems.
We see advantages in scalability and costs. Large volumes of text can be dubbed in a short time and at low cost. Our sound engineers also love the consistency and reproducibility of artificial voices (“never ageing” “never catching a cold” “always sounding the same”). Recently, multilingualism has also been added. One and the same AI voice of the new generation speaks fluent English, German, Spanish, Portuguese and Mandarin, for example.
Currently, however, and with the ears of audio aficionados, we do not (yet) see AI voices as a “fundamental alternative”. As fascinating as the results are, it needs a lot of effort to ensure that foreign words, enumerations, abbreviations, type designations, etc. are “pronounced” correctly. In our opinion, emotionality, rhythm and acting – in short, emotional storytelling – are still far removed from the possibilities of human voices. Legal exploitation issues are often not sufficiently clarified and ethical questions, such as the possibility of manipulation and misinformation, are often ignored.
Overall, we see potential in AI-generated voices, but their application requires a careful balance between efficiency and authenticity.