Just ask Pat Quinn, an ALS sufferer who got his voice back, thanks to Project Revoice and Lyrebird, one of a handful of companies using AI to clone a person’s voice.
In 2017, Amyotrophic Lateral Sclerosis (ALS), a devastating neurological disorder, robbed Pat Quinn, the founder of the famous Ice Bucket Challenge, of his ability to speak.
In 2018, artificial intelligence helped him get it back.
Thanks to advances in machine learning and deep learning, artificial intelligence algorithms have become very good at imitating humans. But while many prominent developments in the space have been negative, AI’s imitation power was a force of positive change for Quinn.
“Most people living with ALS (also known as motor neuron disease) end up paralyzed and unable to communicate with anything but artificial ‘computer’ voices,” says Oskar Westerdal, cofounder of Project Revoice, an initiative that aims to help ALS patients like Quinn.
To recreate Quinn’s voice, Project Revoice collaborated with Lyrebird, one of a handful of companies that use AI to clone a person’s voice— a group that also includes Google’s WaveNet and Voicery, a Y Combinator–backed startup that uses AI to create synthesized voice recordings.
How Deep Learning Generates Human Voices
Behind these applications are deep-learning algorithms, a popular branch of AI that peruse large sets of data for insights and patterns that can’t be captured with traditional, rule-based software. When you train a deep-learning voice synthesizer with enough voice recordings, it creates a digital model that represents the person’s voice and can generate new voice samples.
Before the advent of AI-powered voice synthesis technology, ALS patients had to use generic digital voices that weren’t their own. Other technologies could stitch together pre-recorded sentences with the patient’s voice, but the results were too artificial and required dozens of hours of voice recordings to be of minimal use.
Deep-learning applications, on the other hand, require much less data and provide better results. “What Lyrebird can achieve with just a couple of hours of audio is remarkable—it gives people a complete digital voice clone, so they can say whatever they want,” Westerdal says.
Recreating the Voice of a Voiceless Person
One of the limits of deep-learning applications is their dependence on high-quality data samples to train their neural networks. The problem with ALS patients is that once they lose their voices, recording voice samples is impossible. Fortunately, Quinn had hours of recorded keynotes and interviews.
“The biggest challenge was quality. This technology is completely dependent on having consistent, high-quality recordings that also follow an exact script—so we had to work with a sound studio to manually ‘remaster’ and transcribe every line of dialogue we could find of Pat,” Westerdal says.