Cortical and Sensory Causes of Individual Differences in Selective Attention Ability Among Listeners With Normal Hearing Thresholds

Purpose: This review provides clinicians with an overview of recent findings relevant to understanding why listeners with normal hearing thresholds (NHTs) sometimes suffer from communication difficulties in noisy settings. Method: The results from neuroscience and psychoacoustics are reviewed. Results: In noisy settings, listeners focus their attention by engaging cortical brain networks to suppress unimportant sounds; they then can analyze and understand an important sound, such as speech, amidst competing sounds. Differences in the efficacy of top-down control of attention can affect communication abilities. In addition, subclinical deficits in sensory fidelity can disrupt the ability to perceptually segregate sound sources, interfering with selective attention,

I n a sizeable minority of cases, patients seeking audiological treatment have normal hearing thresholds (NHTs) but report difficulties understanding speech when there are competing sound sources (Hind et al., 2011). Such listeners are said to have "central auditory processing disorder" or "auditory processing disorder" (Furman, Kujawa, & Liberman, 2013;Kujawa & Liberman, 2009;Lin, Furman, Kujawa, & Liberman, 2011;Rosen, Cohen, & Vanniasegaram, 2010), a catchall diagnosis that says nothing about the underlying causes of the communication difficulties, making it difficult to develop effective treatments.
The challenge of understanding speech in settings where there are multiple sound sources is known as the cocktail party problem, a term originally coined by Cherry (1953). Understanding how listeners with normal hearing solve the cocktail party problem has remained a focus of study for over 50 years (e.g., see Bee & Micheyl, 2008;Bodden, 1993;Hafter et al., 2013;Wood & Cowan, 1995;Yost, 1997), in no small part because of its importance: difficulties in such settings expose communication difficulties that do not show up in simpler listening conditions.
There are numerous reasons why listeners with NHTs might find it difficult to communicate in noisy environments, from language-specific deficits to general cognitive deficits. This article reviews two specific issues that can affect the ability of listeners with NHTs to solve the cocktail party problem: (a) efficacy of cognitive control networks in the brain responsible for focusing selective auditory attention and (b) fidelity of the sensory representation of suprathreshold (clearly audible) sound. By differentiating among more specific mechanistic failures that can impede communication in complex settings, clinicians will be able to provide appropriate counseling and care management and, ultimately, targeted interventions.
At first, it may seem surprising that listeners can have difficulty understanding speech in cocktail party settings but do not report difficulty in other listening situations. However, solving the cocktail party problem places much greater cognitive and sensory demands on the listener than does listening in quiet. As discussed in the section on We Rely on Selective Attention to Communicate in Noisy Social Settings, audibility often is not the factor that limits understanding for listeners with NHTs. Instead, central processing resources can limit what we can consciously perceive. We manage this limitation by focusing attention on whatever acoustic source we will process, which relies on engaging cortical networks to filter inputs in ways that are unnecessary in quiet. Some listeners may have problems controlling cortical control networks and, therefore, have difficulty focusing selectively on whatever sound they want to hear, an idea developed further in the section Individuals Differ in Their Ability to Control Selective Attention. More important, as discussed in the section on Auditory Selective Attention Depends on Auditory Object Formation, selection of a sound source from a mixture of competing sounds can only be effective if we are able to perceptually segregate the sources in the sound mixture reaching our ears, a process known as auditory scene analysis (Bregman, 1990). The ability to analyze and make sense of an auditory scene depends upon the sensory representation of the acoustic mixture being robust and rich enough to support perceptual segregation. Critically, as discussed in the section on Individuals Differ in Their Ability to Encode Fine Temporal Details in Suprathreshold Sound, even listeners with NHTs may suffer from sensory deficits that impair auditory scene analysis.
The outcome of having difficulty when trying to understand speech in noisy settings may be caused by very different underlying causes. As discussed in the section Future Impact in the Clinic, there are no effective treatments yet for these issues; however, as we begin to understand the mechanisms that can lead to difficulties communicating in noise, specific interventions and treatments can be developed.

Selective Attention Determines What We Perceive in a Complex Scene
We do not process every piece of information in our environment. Instead, we focus attention on one object and analyze it in detail, often at the expense of other items that are clearly visible or audible (e.g., see a recent overview by Fawcett, Risko, & Kingstone, 2015). Indeed, if our attention is directed at one object in a complex scene, we may be completely oblivious to other perfectly perceptible details.
This inability to process everything in a scene enables a skilled magician to astonish us with her sleight of hand; by drawing our attention to some salient, engaging event, such as a dramatic flourish of her cape, we miss how she drops the "disappearing" coin into her pocket. In a complex scene, we naturally focus on an object that is interesting, either because it is inherently important (such as the sight of a lone person in an empty restaurant or the sound of a voice amidst buses and cars on a busy street) or because it is surprising (such as an unexpected flash of lightning or a sudden clap of thunder). In the laboratory, our poor ability to identify changes in complex visual or auditory scenes-especially when the change is in some "boring" background detail-is known as either "change blindness" (e.g., see Beck, Rees, Frith, & Lavie, 2001;Rensink, 2002;Simons & Levin, 1997;Simons & Rensink, 2005) or "change deafness" (e.g., see Dalton & Fraenkel, 2012;Eramudugolla, Irvine, McAnally, Martin, & Mattingley, 2005;Koreimann, Gula, & Vitouch, 2014), respectively. Experiments investigating change blindness and change deafness provide controlled laboratory demonstrations of the idea that, unless we actively attend to sensory information that is readily available on the retina or the cochlea, we will fail to consciously perceive that information. Change blindness and change deafness prove that we do not process every bit of information available to us; instead, attention governs a bottleneck in our perceptual processing, determining what we "see" and "hear" from the information assailing our eyes and ears.
What object a listener (or viewer) attends is not simply determined by top-down intention (focusing on something we think is important, behaviorally). Instead, there is a complex interplay between volitional focus and bottom-up salience (Desimone & Duncan, 1995;Kastner & Ungerleider, 2001). Specifically, sudden, unexpected events can draw attention involuntarily, no matter what one's intention; for instance, if you are trying to listen to your date at a loud restaurant and the waiter drops a tray behind you, your attention will be drawn to the crash of the glass and china at your back. Regardless of whether the sound that is attended to is something you were trying to hear or something that grabbed your focus involuntarily, the result is the same: once you are focused on one sound, other sounds outside the focus of attention are processed with less detail.

Attention Modulates What Information Is Encoded in Sensory Cortical Regions
When a listener focuses attention on one object in a complex scene, the brain filters out information from other sources; attention fundamentally alters the degree to which the cortex even responds to what are perfectly perceptible inputs (e.g., see Fritz, Elhilali, David, & Shamma, 2007;Kastner & Ungerleider, 2000). Animal studies demonstrate these effects of attention using invasive neurophysiology. For instance, neurons in visual cortex may respond strongly to an object in the visual field if an animal is attending to that object but may not respond at all if the animal is attending to a different object on the screen-even though the input reaching the animal's retina is identical (e.g., see Buschman & Miller, 2007). In auditory neuroscience, single neurons also alter their responses to sounds in a mixture depending on what the animal is attending to (e.g., see Elhilali, Fritz, Chi, & Shamma, 2007;. In human listeners, effects of attention on neural responses to sound have been demonstrated with various methods. For instance, noninvasive electroencephalography (EEG) and magnetoencephalography (MEG) produce precisely timed responses that are evoked by onsets or events in a sound (e.g., by individual notes in a melody). By comparing how much these evoked responses depend on how attention is directed, one can quantify the strength of cortical control of attention (e.g., Choi, Wang, Bharadwaj, & Shinn-Cunningham, 2014;Friedman-Hill, Robertson, Desimone, & Ungerleider, 2003;Muller et al., 1998). For simple input streams, such as competing note streams, it is easy to see these effects directly by averaging EEG or MEG responses to many repetitions of identical inputs under different attentional conditions. Natural ongoing sound streams, such as speech, typically have ongoing amplitude modulation, which one can think of as a series of pseudorandom onsets and partial onsets. These fluctuations drive the EEG and MEG signals; if there are two competing streams, the signals are driven more strongly by the fluctuations of energy in the attended stream than those in an unattended waveform (e.g., see Lalor & Foxe, 2010;Zion Golumbic, Cogan, Schroeder, & Poeppel, 2013;Zion-Golumbic & Schroeder, 2012).
Often, the effects of attention are so strong that not only can you measure the average effects of attention on neural responses but you can also even use a single recording from the brain to deduce to which source attention is directed. For instance, any single recording of EEG or MEG is noisy, making it is hard to see specific events or to quantify attentional effects. However, by looking at which sound stream better predicts the evoked response to a short, 2-to 3-s-long recorded EEG or MEG pattern, it is possible to determine which source a listener is attending to (Choi, Rajaram, Varghese, & Shinn-Cunningham, 2013;Ding & Simon, 2012a, 2012b, 2013Lalor & Foxe, 2010;O'Sullivan et al., 2014;Power, Foxe, Forde, Reilly, & Lalor, 2012;Rimmele, Zion Golumbic, Schroger, & Poeppel, 2015). Intracranial recordings (electrocorticography) in human listeners, which allow direct measurement of very localized brain activity, demonstrate the full power of attention on cortical encoding of speech in human listeners (e.g., Zion Golumbic, Ding, et al., 2013). In fact, using electrocorticography, one can crudely decode the content of the speech waveform that a listener is attending to in a mixture of competing sounds (Pasley et al., 2012).
Together, these studies demonstrate that attention filters out sounds that a listener is ignoring, reducing the cortical response to these interfering sounds. Even though ignored sounds reaching the ear cause strong responses in the early, subcortical portions of the auditory pathway (e.g., see Varghese, Bharadwaj, & Shinn-Cunningham, 2015), they may cause almost no response in the cortex, which is responsible for consciously perceiving the sound. Remember this the next time your friend fails to look up from their book when you say hello; their brain may not have registered that you were there, right next to them, even though you know that they could hear you.
During visuospatial attention, cortical activity increases in a set of regions spanning the frontal and parietal regions of the cortex, a result that has been well documented over the last few decades (e.g., see Buschman & Miller, 2007;Friedman-Hill et al., 2003;Giesbrecht, Woldorff, Song, & Mangun, 2003;Serences, Schwarzbach, Courtney, Golay, & Yantis, 2004). Converging evidence shows that, when listeners focus their auditory selective attention on a particular spatial location, areas in this well-known visuospatial attention network are more active compared with when not performing a task (Kong et al., 2014;Michalka, Rosen, Kong, Shinn-Cunningham, & Somers, 2016;Shomstein & Yantis, 2004, 2006Tark & Curtis, 2009). These regions are also more active when listeners solve the same cocktail party problem by using spatial cues (e.g., "listen to the talker on the left") than when they use spectrotemporal acoustic features (e.g., "listen to the female talker"; Bharadwaj, Lee, & Shinn-Cunningham, 2014;Hill & Miller, 2010;Lee et al., 2013;Michalka, Kong, Rosen, Shinn-Cunningham, & Somers, 2015). These results show that spatial auditory attention uses some of the same neural circuitry to control attention as does spatial visual attention.
In contrast, when listeners focus their auditory selective attention on nonspatial, spectrotemporal acoustic features of a sound source, the visuospatial attentional network is less active (Braga, Wilson, Sharp, Wise, & Leech, 2013;Michalka et al., 2015Michalka et al., , 2016. Instead, regions of the brain closely associated with auditory processing show enhanced activation (Hill & Miller, 2010;Larson & Lee, 2014;Lee et al., 2013;Michalka et al., 2015Michalka et al., , 2016. Thus, topdown attention to nonspatial auditory features differentially engages areas associated with auditory-specific processing and causes less activity in the visuospatial orienting network. Regardless of the form of attention, once a listener knows what spatial or nonspatial feature defines the target stream, brain activity in the appropriate control regions increases, showing anticipatory activity related to the cue defining the target sound in the upcoming stimulus Hill & Miller, 2010;Lee et al., 2013). This preparatory activity thus seems to set up the filtering of sounds on the basis of the desired target properties.
Recent studies have identified interdigitated (distinct, interlocked) regions in lateral prefrontal cortex, a part of the brain important in executive function, that favor either visual attentional processing or auditory attentional processing Osher, Tobyne, Congden, Michalka, & Somers, 2015). These specialized areas are likely the sources of top-down modulatory feedback during attentionally demanding tasks. More important, these control regions, although biased to "talk to" visual or auditory sensory regions, are recruited to help control attention in the other sensory modality under certain circumstances. Specifically, one recent study found that auditory-biased control regions were always active during auditory tasks and that visual-biased control regions were always active during visual tasks . However, when a visual task required subjects to make judgments about visual input timing, the auditory-biased control regions were recruited. Conversely, when a task required spatial auditory processing, the visual-biased control regions were more engaged than when listeners had to make temporal auditory judgments about the same stimuli .
Overall, these results paint a picture of attention engaging different networks in the cortex to modulate sensory information, depending on what the observer is trying to do. When focusing auditory selective attention in a complex scene on the basis of spectrotemporal content, auditorybiased executive control regions work with auditory sensory processing regions to filter out unwanted sounds. When focusing selective auditory attention on the basis of location, auditory-and visual-biased control regions in the prefrontal cortex work together to effect attention, engaging additional regions of the brain that have long been associated with visual spatial attention.

Auditory Selective Attention Depends on Auditory Object Formation
As discussed above, selective attention allows us to suppress competing sounds in order to analyze whatever source we attend to, but this means that, in order for us to understand speech in a noisy setting, we must perceptually segregate the attended sound source from other sounds in the environment. For instance, in a noisy restaurant, we can only focus on and understand what our companion is saying if we can separate their voice from the clatter of plates and the chatter of people at the next table. The process of object formation, or estimating which components of a sound mixture came from the same external sound source, is thus an important part of solving the cocktail party problem (Shinn-Cunningham, 2008). This ability to make sense of a mixture of multiple sounds is often referred to as "auditory scene analysis" (Bregman, 1990).
Over relatively brief time frames (on the order of tens of milliseconds), spectrotemporal structure causes sound elements to bind together (see reviews by Carlyon, 2004;Griffiths & Warren, 2004). In speech, the heuristics that determine what sounds form a syllable operate on this short time scale on the basis of multiple acoustic features; the same rules also apply to nonspeech sounds. For instance, sound elements that are near each other in frequency and close together in time tend to be perceived as coming from the same source. When sound elements are comodulated with relatively slow modulations below about 7 Hz (turning on and off together or changing amplitude together), they tend to be grouped together perceptually (see examples in Fujisaki & Nishida, 2005;Hall & Grose, 1990;Maddox, Atilgan, Bizley, & Lee, 2015;Oxenham & Dau, 2001). Indeed, in typical English speech, syllabic rates are in this range, typically below 10 Hz (Greenberg, Carvey, Hitchcock, & Chang, 2003). Interestingly, although people have an intuitive sense that we should group together sound elements that have spatial cues consistent with the same source location, spatial cues have a relatively weak impact on perceptual grouping of brief sound elements (Darwin & Hukin, 1997); spatial cues generally only influence perceptual grouping at this level when other spectrotemporal cues are ambiguous (e.g., Schwartz, McDermott, & Shinn-Cunningham, 2012; Shinn-Cunningham, Lee, & Oxenham, 2007). Sounds that are harmonically related also tend to be perceived as having a common source, whereas inharmonicity can cause grouping to break down (Culling & Darwin, 1993a;Darwin, Hukin, & al-Khatib, 1995;. All of these temporally "local" sound features (i.e., proximity in frequency and time, comodulation, spatial cues, harmonicity, etc.) work together to determine how syllables are formed out of a mixture of sounds.
Stepping back, all of the acoustic features that tend to cause sound elements to bind together perceptually into an auditory object, both at the syllabic level and across syllables, share a common trait: they would be unlikely to happen by chance. Specifically, sound elements that were generated by the same natural sound source are very likely to have spectrotemporal continuity, common harmonic structure, comodulation, common spatial cues, repetitive structure, and other correlated spectrotemporal structure. However, these features are unlikely to be shared by sound elements being produced by unrelated natural sound sources. For instance, two sound elements being generated by two independent physical sources could start and stop at the same time; however, this is extremely unlikely to occur in the natural world.
The neural mechanisms that lead to object formation are not yet well understood, but theories are emerging. Early on, it was suggested that, when distinct neural populations are activated by different sound elements, each population is heard as a distinct object (Micheyl, Tian, Carlyon, & Rauschecker, 2005). However, there are many examples of sound elements that excite distinct neural populations but are still heard as one object (for instance, when narrowband signals whose center frequencies are far apart, and thus encoded by very different parts of the tonotopic auditory pathway, share common modulation, they are nonetheless heard as a single stream; Elhilali, Xiang, Shamma, & Simon, 2009). This observation has led to the suggestion that object formation comes about when activity in a subpopulation of neurons is correlated over time (O'Sullivan, Shamma, & Lalor, 2015;Shamma, Elhilali, & Micheyl, 2011).

Individual Differences in the Ability to Selectively Attend to Sound in Listeners With NHTs
Over the last few years, work in my laboratory has explored how well healthy young adults with NHTs can perform attentionally demanding auditory tasks. We find large variability in performance across a range of tasks. More important, depending on the task, variability seems to arise from differences in the ability to control attention, from differences in sensory coding fidelity (which can have an impact on the fidelity of the acoustic cues used for selecting and segregating a target from a sound mixture), or from a combination of both cognitive and sensory factors (e.g., see Dai & Shinn-Cunningham, 2016). Of course, everything from general cognitive ability to aging affects the ability to understand speech in complex settings, often with dramatic effect (e.g., see Anderson, White-Schwoch, Parbery-Clark, & Kraus, 2013;Banh, Singh, & Pichora-Fuller, 2012;Benichov, Cox, Tun, & Wingfield, 2012;Brungart et al., 2013;Gordon-Salant, Fitzgibbons, & Friedman, 2007;Gordon-Salant, Yeni-Komshian, Fitzgibbons, & Barrett, 2006;Grose & Mamo, 2010Grose, Mamo, & Hall, 2009;Hall, Buss, Grose, & Roush, 2012;Nakamura & Gordon-Salant, 2011;Noble, Naylor, Bhullar, & Akeroyd, 2012;Ronnberg, Rudner, & Lunner, 2011;Singh, Pichora-Fuller, & Schneider, 2008, 2013Tun, Williams, Small, & Hafter, 2012;Veneman, Gordon-Salant, Matthews, & Dubno, 2013;Weisz, Hartmann, Muller, Lorenz, & Obleser, 2011). Still, in my own lab, we find that even young, healthy adults with NHTs have smaller, but still consistent, significant and behaviorally relevant individual differences in hearing ability. Moreover, differences both in terms of the efficacy of cortical control and/or of the subclinical differences in sensory coding fidelity (i.e., differences that cannot be diagnosed currently in the clinic) often at least partially account for these differences in perceptual ability.

Individuals Differ in Their Ability to Control Selective Attention
In a sound mixture where competing streams are perceptually distinct and object formation should be easy even if a listener has modest sensory deficits, listeners nonetheless exhibit differences in the ability to selectively focus on one stream and identify its content. Such differences often reflect differences in cortical control.
In one recent study, we asked listeners to identify the contour of a target melody played along with other, simultaneous melodies . Importantly, although our task required control of auditory spatial attention, it was a nonlinguistic task and, thus, presumably did not engage cortical areas dedicated to speech and language processing. Instead, performance likely relied on more general attentional networks. The competing melodies were simulated with very large, different interaural time differences (ITDs), resulting in sounds that were perceptually very distinct in their spatial locations. We tested both an "easy" condition, where the pitch ranges of the melodies did not overlap, and a "hard" condition, where the pitch ranges overlapped. In the easy condition, both spatial and nonspatial cues differentiated the competing streams and jointly supported source segregation. In contrast, in the hard condition, only spatial cues differentiated target and competing sound streams. However, in both cases, listeners were instructed about which melody to attend to based on its location; thus, in both easy and hard conditions, top-down attention presumably relied on the "spatial" network described in the section on Attention Engages Different Attentional Control Networks.
In the easy condition, listeners achieved over 95% correct on average (when chance performance was 33% correct); even the worst listener did very well, near 85% correct. Critically, even though every listener did very well in the easy condition, the individual differences were very consistent. Every listener performed better on the easy condition than on the hard condition. Importantly, across individuals, there was a strong correlation between performance in the easy and in the hard conditions, suggesting that differences in the ability to focus spatial attention directly influenced how well a subject did in both the easy and the hard conditions. Throughout this study, we measured how strongly attention affected cortical responses by measuring EEG; we analyzed only the trials where the listener answered correctly, and we compared evoked responses when a stream was the attended target versus when it was a distractor . Importantly, we found that even in the easy condition, where every listener performed well, there was a strong correlation between percent correct on the task and the strength of attentional modulation on cortical responses to the sounds . In other words, the listeners who performed worst on the task modulated their sensory cortical responses less effectively than did the listeners who performed best, even when the sound mixture was easy to segregate, with redundant, salient spatial and pitch cues for source segregation, and even when, behaviorally, everyone performed relatively well.
Other studies find similar differences in ability that are correlated with neural differences. For example, the ability to decode attentional state from neural responses differs across listeners and correlates with differences in the strength of attentional modulation of the cortical response (e.g., Choi et al., 2013;O'Sullivan et al., 2014).
Recently, we tested military personnel who have been exposed to blasts, which are thought to impair cognitive function through damage to cortical cognitive control networks, as well as to cause posttraumatic stress that impairs sleep and disrupts executive function. We tested blastexposed veterans using the same melody-identification task mentioned above (Bressler, Bharadwaj, Choi, Bogdanova, & Shinn-Cunningham, 2014). Performance by the blastexposed veterans was extremely poor; in fact, we only ended up using the easy condition, where typical healthy young adults scored better than 95% on average. In this condition, the blast-exposed veterans scored between chance and 90%, with the best-performing veteran doing comparably to the worst of our typical controls. Because one might worry that veterans have had excessive noise exposure that might affect their auditory coding fidelity, we restricted our study to listeners with NHTs and also tested their temporal coding precision. We found no differences in auditory coding fidelity that could explain their performance on the selective attention task-unlike in other cases discussed in the section on Individuals Differ in Their Ability to Encode Fine Temporal Details in Suprathreshold Sound. This result demonstrates that impaired cortical processing can lead to problems focusing on auditory attention, which in turn can have devastating consequences when trying to communicate in social settings.
Our results confirm findings from other studies of blast-exposed veterans by other hearing scientists. For instance, it is clear that a blast has the potential to cause a range of perceptual and cognitive problems (Fausti, Wilmington, Gallun, Myers, & Henry, 2009). However, careful study of the effects of a blast exposure on communication abilities shows that most problems manifest in complex settings that rely on cognitive, speech, and language processing, even when there are weak or no obvious sensory deficits .
Together, these results show that some of the variability in selective attention performance among listeners with NHTs can come from differences in their ability to deploy cortical attention networks. The degree to which a listener is able to modulate sensory responses on the basis of task demands directly predicts how well they perform on attention tasks. In extreme cases, such as in the blast-exposed veterans, these individual differences can be large. However, even in young, healthy adults, differences in cortical control correlate with performance on tasks requiring auditory selective attention.

Individuals Differ in Their Ability to Encode Fine Temporal Details in Suprathreshold Sound
As discussed in the section on Auditory Selective Attention Depends on Auditory Object Formation, the details of how object formation takes place in the brain remain an open topic of research. Still, it is clear that the acoustic features that support object formation rely on fine spectral and temporal features of sound, such as harmonic structure, interaural differences, timbre, and other features (Bregman, 1990;Carlyon, 2004;Darwin, 1997). It thus makes sense that listeners with elevated hearing thresholds, who have broader-than-normal cochlear tuning, poor temporal resolution, and reduced dynamic range, will have difficulty communicating in cocktail party settings (e.g., see Best, Mason, & Kidd, 2011;Best, Mason, Kidd, Iyer, & Brungart, 2015;Gallun, Diedesch, Kampel, & Jakien, 2013;Jakien, Kampel, Gordon, & Gallun, 2017;Roverud, Best, Mason, Swaminathan, & Kidd, 2016;Srinivasan, Jakien, & Gallun, 2016; see also the discussion in Shinn-Cunningham . However, even listeners with NHTs may differ in the fidelity with which their ears encode acoustic inputs, which may in turn affect their ability to extract auditory objects from a complex acoustic mixture. A growing number of animal studies show that noise exposure can cause a loss of auditory nerve fibers (ANFs) without elevating hearing detection thresholds or affecting the audiogram (Kujawa & Liberman, 2015;Liberman, 2015;Liberman, Liberman, & Maison, 2015). Noise exposure that leaves cochlear mechanical responses intact can produce a rapid loss of as many as 40%-60% of the ANF synapses driven by cochlear inner hair cells (cochlear synaptopathy), which carry the ascending signal up the auditory pathway (Kujawa & Liberman, 2006. Following the loss of synapses, ANF cell bodies (spiral ganglion cells) and central axons degenerate, leading to cochlear neuropathy (Kujawa & Liberman, 2015;Lin et al., 2011). Importantly, the effect on cochlear function can be negligible; cochlear tuning and behavior detection thresholds can be normal in exposed animals (Kujawa & Liberman, 2009).
Most hearing screenings reveal losses associated with damage to inner and outer hair cells. Yet, with cochlear synaptopathy, measures of cochlear function are normal, making the deficit "invisible" to typical hearing screenings (explaining the use of the colloquial term hidden hearing loss to describe these problems; see Schaette & McAlpine, 2011).
Although detection thresholds may be normal in animals with cochlear synaptopathy, the loss of independent ANFs degrades temporal processing, which particularly degrades the coding of temporal modulation in suprathreshold sound. These effects can be seen, for instance, in the fidelity of phase locking in brainstem responses to amplitude modulation and the effects of additive noise and forward masking on subcortical neural responses (e.g., see Chambers et al., 2016;Furman et al., 2013;Hickox & Liberman, 2014).
Although it is difficult to prove directly that cochlear synaptopathy causes hearing problems in humans with normal cochlear mechanical function, a growing number of studies suggest that it does. Listeners with NHTs differ in their ability to use fine temporal cues (see Grose & Mamo, 2010;Mehraei, Gallardo, Shinn-Cunningham, & Dau, 2017;Mehraei et al., 2016;Strelcyk & Dau, 2009). This variability correlates with difficulties in using spatial selective attention to focus on and understand speech in a noisy background (Bharadwaj, Masud, Mehraei, Verhulst, & Shinn-Cunningham, 2015;Paul, Bruce, & Roberts, 2017;Ruggles & Shinn-Cunningham, 2011), underscoring the clinical relevance of these differences.
Listeners with NHTs show large intersubject variability in the magnitude of auditory brainstem response (ABR) Wave I (Schaette & McAlpine, 2011;Stamper & Johnson, 2015), supporting the view that some listeners with normal audiograms may suffer from cochlear synaptopathy, albeit to varying degrees. As in animal studies, while ABR Wave I amplitude varies significantly across individuals, the magnitude of ABR Wave V does not (Schaette & McAlpine, 2011;Stamper & Johnson, 2015). One study has shown that perceptual differences correlate with these differences in human ABRs: In young adults with no known hearing deficits, Wave I magnitude correlates with ITD sensitivity, whereas Wave V magnitude is unrelated to Wave I magnitude or perceptual ability (Mehraei et al., 2016). Indeed, cochlear synaptopathy reduces the strength of auditory nerve responses; the auditory system then seems to respond by increasing some internal gain to amplify the weak response that remains (e.g., see Chambers et al., 2016). Based on these findings, one proposed method for identifying cochlear synaptopathy in humans computes the ratio of the summation potential (the response of the hair cells in the cochlea) to the action potential (the auditory nerve response; Liberman, Epstein, Cleveland, Wang, & Maison, 2016); however, neither this metric nor any other has yet been proven to be diagnostic of cochlear synaptopathy in humans (see comments in the section on Future Impact in the Clinic).
In one study in my own laboratory, young adult subjects were recruited with no special criteria except that they had NHTs and no known auditory deficits . Individual differences among this cohort were nonetheless large. Perceptual abilities (including the ability to selectively attend to speech in a mixture and sensitivity to ITDs) correlated with the strength of temporal coding in the brainstem. Crucially, listeners had normal compressive growth of cochlear response (measured by distortion product otoacoustic emissions), normal frequency tuning (measured by psychoacoustic estimation), and puretone audiometric thresholds of 15 dB HL or better at octave frequencies between 250 Hz and 8 kHz. In other words, although perceptual differences correlated with an objective measure of the precision of brainstem temporal coding, these differences could not be explained by cochlear mechanical function.
If listeners with NHTs suffer from cochlear synaptopathy, it makes sense that it could lead to difficulties communicating in cocktail party settings Plack et al., 2014). In order to selectively attend, listeners must be able to segregate sounds making up the acoustic mixture entering the ears. Source segregation depends on various features that will be degraded when temporal coding is poor (Bregman, 1990;Carlyon, 2004). If temporal features are degraded and the target source cannot properly be segregated from the scene, selective attention will fail (Shinn-Cunningham, 2008;. On top of this, poor temporal cues can degrade spatial and pitch information, which could otherwise be used for directing auditory selective attention. For instance, if spatial cues are degraded, they might be too diffuse to resolve which source is the target; both sources may be perceived as coming from roughly the same direction. We actually found evidence for this in one series of studies. Listeners with NHTs exhibited large individual differences in the ability to report a speech stream from directly in front when it was flanked by two similar speech streams, one 15°to the left of center and one 15°to the right of center (Ruggles & Shinn-Cunningham, 2011). Importantly, when listeners failed to report a target word, they generally reported one of the words from the other streams: they failed to resolve the target and competing words on the basis of location but nonetheless understood words in the mixture. In these experiments, selective attention performance correlated with brainstem physiological measures (Ruggles, Bharadwaj, & Shinn-Cunningham, 2012;Ruggles & Shinn-Cunningham, 2011), suggesting that the differences arose from subtle differences in temporal coding strength in the brainstem of our listeners, all of whom had NHTs.
These results highlight some of the mounting evidence that many listeners with NHTs differ in the fidelity of temporal coding of sound entering the ear. Together, behavioral and physiological measures suggest that these differences are the result of differences due to cochlear synaptopathy.

Future Impact in the Clinic
As many as 5%-10% of listeners who seek audiological treatment have NHTs (Hind et al., 2011). This review focuses on two specific mechanisms that can lead to poor outcomes for listeners with NHTs when trying to understand speech in noisy settings, which may send them to the audiologist: (a) deficits in the control of cortical attention networks that filter out neural responses to unwanted sounds and (b) sensory deficits arising due to a loss of ANFs in the absence of elevated hearing thresholds (cochlear synaptopathy). Determining the root cause of communication difficulties in such listeners is important when trying to treat their symptoms. For instance, a listener who has a specific language deficit might benefit from intensive language training; however, such an approach is unlikely to improve outcomes for a listener who has cochlear synaptopathy. Unfortunately, as yet there are no accepted methods for diagnosing these deficits in the clinic, let alone for effectively treating such problems, were they properly diagnosed.
Although clinical tests are not yet available, many basic hearing researchers are currently working to develop efficient, sensitive screenings for these (and other) specific deficits that have an impact on hearing in everyday settings. My own laboratory is working to develop behavioral and electrophysiological measures that isolate deficits in attentional control as well as cochlear synaptopathy. More generally, the growing realization that cochlear synaptopathy likely plays a significant role in human communication lead the National Institute on Deafness and Other Communication Disorders to sponsor a workshop in 2015 entitled "Synaptopathy and Noise-Induced Hearing Loss: Animal Studies and Implications for Human Hearing" that addressed mechanisms and potential therapies as well as diagnostics (see "Synaptopathy and Noise-Induced Hearing Loss: Animal Studies and Implications for Human Hearing," 2015). The outcome of this workshop was a special solicitation for research proposals to address the diagnosis and, ultimately, treatment of cochlear synaptopathy in human listeners.
Future assistive listening devices are likely to be tailored to address failures of specific mechanisms that produce communication difficulties, including poor attentional control and cochlear synaptopathy. For instance, if a listener has difficulty controlling attentional networks, the best solution may be a device that tries to determine automatically what sound sources are unimportant and to suppress these sounds before they enter the ears. On the other hand, once we better understand exactly how cochlear synaptopathy impacts the representation of speech in noise, we may devise new hearing aid algorithms that address the specific deficits caused by synaptopathy. For now, an increased awareness can allow clinicians to provide better counsel, allowing them to explain that even though a client has NHTs, they may nonetheless have real, specific physiological deficits and not some imagined ailment or overall cognitive decline.

Conclusions
Given the complexity of making sense of an acoustic signal in a crowded, noisy setting, it is amazing how well we communicate in social situations. We bring one perceived sound source into attentional focus so that we can analyze that object in detail. Accomplishing this feat depends upon cortical networks regulating and filtering the information that we process. This filtering, however, can only be effective when the ear represents sound information robustly.
Cognitive control networks modulate what information is represented in the brain and consciously perceived. By suppressing unimportant sounds, cognitive networks select important information from the sound mixture. Special populations, from listeners with attention-deficit/ hyperactivity disorder to veterans with mild traumatic brain injury, may struggle to communicate in complex settings due to failures of executive control of attention. However, even healthy young adult listeners differ in the efficacy of their attentional control.
Cochlear synaptopathy (hidden hearing loss) may be present in listeners with NHTs, even young adults. Individual differences from such loss seem to particularly affect temporal coding. Deficits in temporal coding could impair both object formation and object selection by degrading acoustic features such as pitch, location, and timbre; as a result, cochlear synaptopathy likely contributes to individual differences in the ability to communicate in social settings.
Understanding how central, cognitive factors interact with sensory deficits is a key step toward finding ways to ameliorate communication disorders and improve the quality of life for listeners struggling at the cocktail party. By isolating the factors that limit communications for a particular individual, more effective, targeted interventions can be developed. Future work should focus on designing and vetting clinically feasible tests to evaluate the efficacy of attentional control, as well as the fidelity of suprathreshold sensory coding, enabling diagnosis of specific failures of processes that support communication in cocktail party settings.