Skip to main content
Text Size: sML

Feature Story

The Cocktail Party Problem: How the Brain Decides What Not to Hear

By Robin Latham

The ability to pay attention to one voice among many in a crowded restaurant or party is something we take for granted—we focus on that one voice and screen out the rest—but how selective hearing happens in the brain is something that scientists have been curious about for decades. At least, since 1953 when a British researcher christened it the “cocktail party problem” and speculated that if we could understand the process of how the brain grabs onto the voices we want to pay attention to and pushes the rest aside, we could build a machine to replicate it.

Image showing heads in thought

Almost 60 years later, that machine is very close to becoming a possibility. A pair of NIDCD-supported researchers who study selective hearing have created a way to not only see how the brain filters and segregates competing voices, but to even predict the exact word someone is listening to. A report of their findings was published online on April 18, 2012, in the journal Nature

Edward Chang, Ph.D., an assistant professor of neuroscience at the University of California, San Francisco (UCSF), is a neurosurgeon who operates on people with epilepsy. To help pinpoint the parts of the brain responsible for their seizures, his patients are implanted with a thin sheet of up to 256 electrodes beneath the skull on the outer surface of the brain’s cortex. Over the course of a week, while the patient is in the hospital, the electrodes record the electrical activity of the neurons in the patient’s temporal lobe.

UCSF is one of the few academic epilepsy centers that does these advanced intracranial recordings, which are so discriminating they can report the firings of single neurons. Dr. Chang and postdoctoral fellow Nima Mesgarani, Ph.D., realized they had a unique opportunity. While the electrodes were collecting evidence of seizures, they could also record the activity of neurons in the auditory cortex, the part of the brain that processes sounds, which resides in the temporal lobe. Three patients volunteered for the study, which was conducted in the hospital while they awaited surgery.

In the experiments, the subjects listened to multiple trials of two different speech samples played simultaneously. One of the sample speakers was male, the other female. Each speaker repeated a nonsensical phrase that combined a target word (ringo or tiger) spoken before a combination of a color (red, blue, green) and a number (two, five or seven) along with a connecting verb. A typical phrase was “ready tiger go to red two now.”  The subjects were told to pay specific attention to one of the two target words, which was also shown on a computer screen in front of them, and then report the color and number that the same speaker said afterwards.

The researchers analyzed the recordings using a newly developed and powerful decoding algorithm that scrutinizes the patterns of neural activity to reconstruct what the subjects heard. What the researchers found was that the neural responses in the auditory cortex only reflected the words of the targeted speaker. Apparently, the auditory cortex ignores what it doesn’t want to hear, and focuses on what it does.

What this means, according to Dr. Chang, is that the auditory cortex is doing a lot more work than it had previously been given credit for. “A lot of people thought that the auditory cortex was just passing this information up to the cognitive part of the brain, the frontal cortex and the executive control areas, where it would be really processed,” he says. “What we found was that the auditory cortex is in and of itself pretty sophisticated. It’s as if it knows which sounds should be grouped together and only extracts those that are relevant to the single speaker.” The information has already been analyzed in the auditory cortex before it leaves. 

Drs. Chang and Mesgarani also discovered that the decoding algorithm could actually predict the speaker—as well as the specific words—that the subject was listening to based on the patterns of neural activity. In fact, the algorithm worked so well that it could predict not only the correct responses, but could also tell when the subject’s attention strayed to the wrong speaker.

Besides these new insights into how the brain processes speech, Drs. Chang and Mesgarani have also developed a powerful tool in their new algorithm that could lead to advances in the field of automatic speech recognition—something we’re all familiar with from frustrating phone calls to customer service lines. The current technology works well in quiet conditions, but with the addition of multiple speakers or background noise, it struggles. Dr. Chang suggests that what they’ve discovered about how the human ear processes competing sounds could be reverse-engineered into machine hearing.

Taking it even further, with a technique that allows scientists to peer into the auditory cortex to see, word by word, what it’s listening to, doctors could begin to look at what is happening in the brains of people with dyslexia or attention deficit disorder. “Can you imagine how powerful it would be to peer into the mind of someone with dyslexia,” says Dr. Chang, “to see how their auditory representations and word forms look in the brain?”

As the researchers look forward, they will be working on adapting the technology to a non-invasive method—one that wouldn’t require surgical implantation—to make it a more practical method of recording neural activity in clinical settings. This would allow them, and other researchers, to look at speech processing in the brains of people with many different kinds of speech and language disorders.

This research is supported by National Institutes of Health (NIH) grant DP2OD008627, NIDCD grant R01DC012379, and a National Institute of Neurological Disorders and Stroke (NINDS) grant R00NS065120. NIDCD and NINDS are components of the NIH.

Top