Voice Matters! Speech Researcher Rupal Patel Builds Personalized Voices for People with Severely Impaired Speech
By Kirstie Saltsman, Ph.D.
Rupal Patel, Ph.D.
Credit: Northeastern University
Most people don’t give the ability to speak a second thought; like walking or breathing, we take speaking for granted. This is not true for the roughly 2.5 million people in the United States and the millions more worldwide who have severe speech impairments from birth or as a result of neurological disorders, such as stroke. For these people, communicating is a daily challenge that relies upon the use of computerized voices. While these devices go a long way toward helping people express themselves, the synthetic voices they produce are usually a poor reflection of natural human voices. The lack of diversity in available synthetic voices means that many people may use the same generic voices.
This is what Rupal Patel, Ph.D., director of the Communication Analysis and Design Laboratory and professor of speech language pathology and audiology at Northeastern University, seeks to change. In collaboration with Tim Bunnell, Ph.D., head of the Speech Research Laboratory at A.I. duPont Hospital for Children in Wilmington, Delaware, Dr. Patel has developed a type of technology, dubbed VocaliD, which creates personalized voices for people who have severely impaired speech. Current VocaliD voices are synthesized using ModelTalker, a text-to-speech (TTS) system created as the result of funding from the NIDCD, the U.S. Department of Education, and Nemours Biomedical Research.
“I think voice matters because it’s so intricately tied to our identity, our culture, our history as an individual, and even where we’ve been and what we like to do,” says Dr. Patel. “We haven’t previously thought of a personalized voice as something necessary for people who use a synthetic voice, but each person’s voice is a vital part of who they are.”
VocaliD involves blending the speech of two individuals—a donor and the recipient. First, a recording is made of whatever vocal sounds the recipient is still able to make. Sometimes this only amounts to a single vowel sound, but it’s often enough to discern the pitch, volume, and personal identity of his or her voice. The next step is accomplished with the help of a volunteer voice donor. For the best results, the donor is a close match to the recipient in terms of size, age, region of origin, and other characteristics. The donor records several hundred to several thousand sentences, with the idea being to capture all sounds and sound combinations that occur in the language.
Dr. Patel and her team then use a computer algorithm to split the donor’s recorded sentences into tidbits of sound, forming a huge database that the text-to-speech software will draw upon to recreate or synthesize any unique vocal sound.
What’s novel about VocaliD is the blending of the donor’s sound tidbits with the recipient’s voice, producing speech with the clarity of the donor and the vocal identity of the recipient.
“A slow smile spreading across her face,” is how Dr. Patel described one recipient’s reaction to hearing her very own personalized voice for the first time. But the benefits of a unique voice go beyond the recipient, continues Dr. Patel. They also extend to their families and other people that surround them.
“Voice greatly influences how we perceive people,” says Dr. Patel.
Dr. Patel next plans to build on VocaliD’s success by scaling it up. As a first step toward this goal, she has launched the Human Voicebank Initiative, an effort to collect one million donor voice samples. Volunteers can sign up to donate their voices or to request a personalized voice. So far more than ten thousand donors have signed up and several hundred have requested custom-made voices. The team is working on ways to collect all these voices.
By assembling a large collection of voice samples, Dr. Patel expects to be able to capture the range of voices needed for creating a close match for virtually any recipient, be they young or elderly, large or small.
“The nice thing about crowd sourcing this is that we’re going to get much more of the diversity of dialect and texture of voice that’s out there in the world,” says Dr. Patel. “That texture from the everyday world will appear in the voices we build.”
To make the expansion of VocaliD possible, Dr. Patel is transitioning the technology from its beginnings as a research project to a commercial entity. By commercializing VocaliD, she and her team of engineers and scientists will have the capacity to refine the technology, automating certain steps and making the entire process faster and more efficient.
While Dr. Patel takes satisfaction from the personalized voices she has created so far, the millions more severely speech impaired people around the world who lack individualized voices continue to inspire her.
“It needs to get out of the lab and into the real world,” says Dr. Patel.