Open AccessJournal of Speech, Language, and Hearing ResearchReview Article17 Feb 2021

Cognitive Hearing Science: Three Memory Systems, Two Approaches, and the Ease of Language Understanding Model

    Abstract

    Purpose

    The purpose of this study was to conceptualize the subtle balancing act between language input and prediction (cognitive priming of future input) to achieve understanding of communicated content. When understanding fails, reconstructive postdiction is initiated. Three memory systems play important roles: working memory (WM), episodic long-term memory (ELTM), and semantic long-term memory (SLTM). The axiom of the Ease of Language Understanding (ELU) model is that explicit WM resources are invoked by a mismatch between language input—in the form of rapid automatic multimodal binding of phonology—and multimodal phonological and lexical representations in SLTM. However, if there is a match between rapid automatic multimodal binding of phonology output and SLTM/ELTM representations, language processing continues rapidly and implicitly.

    Method and Results

    In our first ELU approach, we focused on experimental manipulations of signal processing in hearing aids and background noise to cause a mismatch with LTM representations; both resulted in increased dependence on WM. Our second—and main approach relevant for this review article—focuses on the relative effects of age-related hearing loss on the three memory systems. According to the ELU, WM is predicted to be frequently occupied with reconstruction of what was actually heard, resulting in a relative disuse of phonological/lexical representations in the ELTM and SLTM systems. The prediction and results do not depend on test modality per se but rather on the particular memory system. This will be further discussed.

    Conclusions

    Related to the literature on ELTM decline as precursors of dementia and the fact that the risk for Alzheimer's disease increases substantially over time due to hearing loss, there is a possibility that lowered ELTM due to hearing loss and disuse may be part of the causal chain linking hearing loss and dementia. Future ELU research will focus on this possibility.

    Over the last 2 decades, the hearing research community has increasingly accepted that cognitive factors play an important role in models of hearing and language processing, all the way from early postcochlear processing of the speech signal to cortical understanding of gist. This applies especially to listening under adverse conditions (Mattys et al., 2012). Several ways of understanding this top-down–bottom-up interaction have been proposed (e.g., Akeroyd, 2008; Amichetti et al., 2013; Anderson et al., 2013; Arehart et al., 2013, 2015; Besser et al., 2013; Holmer et al., 2016a; Humes et al., 2013; Luce & Pisoni 1998; Pichora-Fuller et al., 2016; Rudner, 2018; Signoret & Rudner, 2019; Stenfelt & Rönnberg, 2009; Wingfield et al., 2015). Memory functions, especially working memory (WM), have been at the focus of our research on mechanisms behind communicative abilities in persons with hearing loss.

    However, a more comprehensive version of the ELU model has to address how long-term memory systems contribute to online decoding, encoding, and inference-making in WM during communication. Therefore, our research has focused on Ease of Language Understanding (ELU) that depends on three interacting memory systems: WM, episodic long-term memory (ELTM), and semantic long-term memory (SLTM; see, e.g., Classon et al., 2013; Ng & Rönnberg, 2019; Rönnberg, 2003; Rönnberg et al., 2011, 2019, 2013, 2010). The works of Humes (e.g., Humes, 2007; Humes et al., 2013) and Gatehouse (Gatehouse et al., 2003, 2006) have been particularly inspirational in our contribution to the development of the field we have dubbed Cognitive Hearing Science. In Arlinger et al. (2009), a broader historical view on the emergence of Cognitive Hearing Science is presented.

    Three Memory Systems and the ELU Model

    The ELU model (Rönnberg, 2003; Rönnberg et al., 2019, 2016, 2013, 2008, 2010) builds on the interplay between the three memory systems (WM, ELTM, and SLTM) relevant for language understanding as well as their interface with a module for sensing language input. This module is assumed to operate Rapidly, Automatically, and Multimodally, when Binding PHOnological (RAMBPHO) information into a coherent percept, irrespective of source of the phonological information. This integration and binding of the different sources of phonological information is assumed to be very rapid, around 180–200 ms (Stenfelt & Rönnberg, 2009). A slower binding process would definitively slow down the implicit, predictive part of the model. Postdiction involves the explicit engagement of WM and its interactions with LTM systems (see more details and examples under Approach I: Memory System Interactions in Online Communication section).

    The model was originally (Rönnberg, 2003; Rönnberg et al., 2008) postulated to cover different speech communication modes such as speechreading by means of tactile-visual (e.g., Rönnberg, 1993), visual–only speechreading (e.g., Lyxell & Rönnberg, 1987, 1989; Rönnberg, 1990), audiovisual facilitation in gating and speech perception (e.g., Moradi et al., 2017, 2019, 2014), and for manipulations of auditory target-auditory background stimuli (Lunner et al., 2009). The general cognitive observation was that, irrespective of communication mode, we were struck by the fact that complex WM capacity (WMC, especially measured by reading span), LTM access speed and the fidelity of phonological representations in LTM re-appeared as good predictor variables in study after study.

    These observations prompted the formulating of a general and comprehensive ELU model, focused on interactions with the three memory systems (Rönnberg, 2003) and the communicative task (see Rönnberg et al., 2019 for thorough discussions of these matters, and under Approach I, this review article, for a more detailed model description).

    Although most research has been focused on speech in one form or another, we have successively noted some constraints when comparing WM for sign and speech. For example, in Rönnberg et al. (2004), WM for sign showed specific activations of parietal areas but a communality with audiovisual speech for prefrontal/frontal areas of the cortex, typically associated with WM processing (Eriksson et al., 2015), was still true. This language modality specificity has also been shown recently in a study on WM for sign compared to moving visual nonsense objects (point-light displays; Cardin et al., 2018). While frontal activations were very similar in an n-back task for deaf native signers, compared to hearing native signers and hearing nonsigners, it turned out that superior temporal sulcus was specifically active in the deaf native signing group for both kinds of visual stimuli, suggesting early cortical and cognitive plasticity due to lack of sensory input (Cardin et al., 2018; see also Cardin et al., 2013, for cognitive plasticity; Rudner et al., 2007, for sign specificity regarding the episodic buffer).

    Constraints on the ELU model with respect to hearing status have also been addressed in one meta-analytic study by Füllgrabe and Rosen (2016), demonstrating that, for hearing-impaired and older participants, WM accounts for significant portions of variance in explaining speech-in-noise (SPIN) performance, whereas this was not the case for normal-hearing young participants. However, later research has shown that even for subclinical/normal-hearing individuals, small variations in hearing acuity is associated with atrophy of the brain in predicted auditory and cognitive cortical sites (Rudner et al., 2019). The same message is true for a paper by Ayasse et al. (2019), where variations within the limits of normal hearing show that even minimal hearing loss has effects on listening to sentences in noise, when sentence grammar is more complex and WM-demanding. Thus, there are interesting findings on hearing status as a constraint on the ELU model such that it can be argued that the model also applies to so called normal or subclinical variability, even though these latter studies did not involve WMC as such (see reference to WMC and context dependence in normal-hearing participants in Rönnberg et al., 2019).

    WM refers to an individual's ability to hold and manipulate a set of items or linguistic fragments currently in mind, for example, in the form of predictions or guesswork (Baddeley, 2012). Our use of the concept materialized after having read the seminal paper by Daneman and Carpenter (1980), in which they emphasized that the manipulation component of WM launched by Baddeley and Hitch (1974) was particularly important when it came to parsing sentences. In other words, there was a need for both storage and a processing function when more complex linguistic materials than the individual words used in much of Baddeley's early work were to be manipulated and understood (Baddeley et al., 2019). Apart from assisting in grammatical processing, the manipulation component of WM is engaged in semantic processing, inference-making (Hannon & Daneman, 2001), and keeping steadfast attention on the gist of a conversation, including turn-taking behavior (Rönnberg et al., 2013). Unlike Baddeley and Hitch (1974), our notion of WM is based on a central pool of resources that can be allocated flexibly to storage of sensory/semantic information, or to semantic, and grammatical processing. If processing of some kind takes most of the resources, less will be available for storage, and vice versa, if storage demands increase, then they will dampen or inhibit processing activities (Sörqvist et al., 2016). The only component of our WM model that is relatively encapsulated from the task-dependent and dynamic storage and processing functions is RAMBPHO. RAMBPHO must by necessity be fast, it can be primed and contextually framed, but before contact is made with LTM, only implicit less time-demanding operations can occur. In addition, the ELU model also differs to other models/frameworks of WM with respect to the communicative focus of WM. For detailed comparisons with other models, we refer to the discussion in Rönnberg et al. (2013).

    We have typically used the visual reading span task (RST) introduced by Daneman and Carpenter (1980) as an index of WMC because then we avoided confounding cognitive measurement with audibility issues when investigating populations with hearing loss (see also Daneman & Merikle, 1996). The advantage of the RST test in the field of Cognitive Hearing Science compared to other tests of WMC is that it taps into the key dual-task storage-processing interaction in speech understanding mentioned above. Specifically, the RST taxes both storage of a set of sentences and semantic processing of each sentence in the set. In our use of the RST, we have instructed the participants to verify if a sentence is absurd or not (e.g., “The train sang a song”). After two or up to six presented sentences, the participant is asked to recall, in the correct sentence-wise order, the first or last words. Thus, in our version of the RST, you cannot be strategic in the sense of focusing on, for example, the last word of each sentence, simply because you do not know which words to recall until the whole set has been presented. The storage-processing interaction in the RST presumably reveals a “raw” real-life dynamic WMC and is therefore a better predictor of recall performance of targets in SPIN tasks than traditional simple span or letter updating/monitoring tasks (Rönnberg et al., 2016, 2013; Rudner et al., 2009). However, RST mimics what we actually do when we listen to speech in noise or at the cocktail party: We try to remember the gist, take turns in a dialogue, and execute semantic verification judgments more or less at the same time. Generally speaking, the dual demands of a WM task seem to be an even more crucial aspect than the actual presentation modality of the task (i.e., a visuospatial vs. a text-based task) in terms of predictive power of SPIN performance, and the tests load equally high on a latent WM factor in a large test battery of 200 participants (see Rönnberg et al., 2016).

    The sensitivity of the storage-processing interaction is revealed in another way when we consider another type of WM-related experiments (Ng et al., 2013, 2015). We developed a SPIN test (i.e., the Sentence-final Word Identification and Recall [SWIR] test), in which each target word is audible (as evidenced by immediate recall of each individual sentence-final word), but where the delayed recall of the final target words of a set of sentences is still facilitated by hearing aid signal processing (i.e., noise reduction). This is true especially against a background of four talkers (4T, i.e., two men and two women reading from four different paragraphs of a newspaper; Micula et al., 2020) and for native speaker babble compared to 4T babble in a foreign language (Ng et al., 2015), implying a difference in the engagement of SLTM, and, hence, in the amount of distraction caused by the masker. Kilman et al. (2014) shows the same SLTM engagement in the case of bilingual experiments with maskers in the native versus nonnative language.

    The proposal is that hearing aid signal processing of maskers can reduce distraction even for audible targets, hence supporting WM storage. In other words, when listening takes place under challenging conditions, hearing aid signal processing can relieve pressure on SLTM processing, rendering more storage capacity. Micula et al. (2020) has demonstrated that binary masking of the noise is particularly important when recall conditions are less predictable in the SWIR test (cf. Ng et al., 2015). Thus, storage and processing interact all the time (see more under postdiction) and, again, this is probably the main reason why, for example, the RST in many instances is a better predictor of sentence-based SPIN performance than simple span tests such as digit or word span (Rönnberg et al., 2013). In Rönnberg et al. (2016; Supplementary Materials, n = 200), we observed that the RST, semantic word pair test, and visuospatial WM (all being dual storage and processing tasks) loaded on the same WM factor (.57 to .68), all three loading higher than a nonword span task (only storage), which loaded .52 on that factor, which validates that the other three tasks are tapping into the processing aspect of WM as well. These dual tasks also were more effective predictors of, for example, recall of Hagerman matrix sentences in 4T babble. We can also make the inference that it is the dual task demands of the cognitive task, not the sensory modality that is critical here.

    ELTM is a memory system of personally experienced events, or episodes, tagged by time, place, space, emotions, and context (Tulving, 1983). As we experience an episode, memory traces of multimodal sensory information, intertwined with semantic associations, related to the objects and context of the episode, are encoded as a personal episodic memory (Rugg et al., 2015). The retrieval process of ELTM is constructive, and reminiscence of a specific event is triggered and supported by episodic and semantic cues (Renoult et al., 2019; Rugg et al., 2015). That is, when an individual is trying to remember a specific episode, the person does this in an active manner, trying to use relevant sensory-perceptual traces and semantic associations to reconstruct the event, rather than simply accessing a stored video clip from LTM.

    Thus, ELTM can be assessed in many ways, with varying contextual support (from, e.g., cued recognition, recognition, cued recall, to free recall), which in turn demand different levels of self-initiated memory search (Craik, 1983). A typical episodic everyday memory question is: “What did you have for breakfast this morning?” ELTM interacts with SLTM in the sense that ELTM always relies on preexisting knowledge structures (e.g., your mental lexicon). However, although ELTM depends on SLTM, neurocognitive evidence suggests partly nonoverlapping systems (Renoult et al., 2019). The notion of what constitutes “long term” varies with experimental paradigms, but most researchers would agree that from 30 min and beyond is acceptable as long term.

    An important interaction between WM and ELTM was observed with a new kind of WM test. In Sörqvist and Rönnberg (2012), we measured WM by a dual task that first had the participants to compare the size of objects or animals (e.g., “is a zebra larger than a mouse”), and then the to-be-remembered (TBR) word appeared (e.g., “elephant”) in a list of comparison and TBR words. Serial recall of the TBR items is required, and the crucial aspect of this SIze Comparison span task (SIC span) is that both the TBR and comparison words belong to the same category in the same list. This can cause confusion at recall, that is, whether the recalled item was a TBR item or a comparison word. WMC measured this way demands the regular storage function and but also an inhibition processing component. In the experiment, the participant was instructed to focus on a target speech about a fictitious culture (masked by another spoken fictitious story). Data show that participants scoring high on the SIC span test predicted higher immediate recall and crucially performed better in delayed recall of the story. No similar correlation was found for RST. This example implies that ELTM is best promoted by a WM function that has the power to focus on the relevant semantic information while disregarding competing semantic information from SLTM.

    SLTM refers to general knowledge, without personal reference, for example, vocabulary tests by means of fluency or by lexical access speed (e.g., Rönnberg et al., 2011, 2016), grammar (e.g., tested by means of comprehension of embedded clauses; Ayasse et al., 2019), phonology (e.g., tested by means of the Cross-Modal Phonological Awareness Test; Holmer et al., 2016b), or world knowledge, for example, in the form of scripts and knowledge about objects and people (Samuelsson & Rönnberg, 1993). An everyday example here would be that most people do not remember when they learned that Paris is the capital of France or, in the case of phonology, knowing that the speech sounds and associated visual cues of a specific word might overlap with sounds of other words in that language. Thus, in those cases, the information belongs to the general knowledge that we carry around in our minds or SLTM, not to personal ELTM traces.

    Approach I: Memory System Interactions in Online Communication

    When we first proposed the ELU model (Rönnberg, 2003; Rönnberg et al., 1998, 2008; Rudner et al., 2009, 2008), we were interested in describing a mechanism that explains why language understanding in some conditions demands extra allocation of cognitive resources, while in other conditions language processing takes place smoothly and effortlessly. To do this, we relied on the three memory systems briefly described above. In this online processing approach, we proposed that the mechanism in question was that of phonological mismatch between phonological information contained in the input signal (picked up by an RAMBPHO input buffer) and phonological representations in SLTM. The original hypothesis was that, especially, the syllable was an important linguistic unit for unlocking the lexicon (Rönnberg, 2003; Rönnberg et al., 2011). If the syllabic information perceived by the listener was distorted or blurred beyond a hypothetical threshold, then a mismatch would trigger WM to aid explicit reconstruction, or so called postdiction (Rönnberg et al., 2013, 2019). The ELU assumption is that explicit use of WM is involved to some degree in increasing effort (Rudner et al., 2012): more missed encoding and retrievals from ELTM and SLTM (Rönnberg et al., 2013) will inevitably cause increased perceived effort to overcome the obstacles of communication (Pichora-Fuller et al., 2016).

    Postdiction

    Several experimental methods have been developed to trigger the putative mismatch function between RAMBPHO output and existing SLTM representations. For example, experimental acclimatization to a nonhabitual kind of signal processing (e.g., Wide Range Dynamic Compression) in the hearing aid (e.g., FAST or SLOW Wide Range Dynamic Compression; Rudner et al., 2008, 2009), with subsequent testing in an acclimatized/familiarized (or nonacclimatized/nonfamiliarized) mode of signal processing, produced strong reliance on WM in mismatched conditions (i.e., FAST–SLOW or SLOW–FAST conditions). For reviews of these and other kinds of data supporting this kind of mismatch mechanism, see Rönnberg et al., 2019; Souza & Sirow, 2014; and Souza et al., 2015, 2019).

    Another example is the manipulation of background noise where the use of speech babble maskers, engaging SLTM, produced the most pronounced distractions (e.g., Kilman et al., 2014; Mattys et al., 2012; Sörqvist & Rönnberg, 2012). It should be noted that the original data of WM dependence (using “speechlike” maskers) had already been observed and discussed (Lunner, 2003; Lunner & Sundewall-Thorén, 2007; see a review by Rönnberg et al., 2010). WMC is also an important predictor of ELTM in such circumstances of initial speech-in-speech maskers (Ng & Rönnberg, 2019; Sörqvist & Rönnberg, 2012). The SIC span emphasizes semantic inhibition rather than semantic verification (as in the RST) and seems like a better predictor of ELTM than the RST in that case (Sörqvist & Rönnberg, 2012; see a previous more detailed explanation).

    As a further example, the ELU model predicts that when individuals become accustomed to the sounds transmitted by their hearing aids, they will automatize speech processing and will be less dependent on WM to understand speech since presumably representations build up over time in SLTM that more closely match the signal perceived (Holmer & Rudner, 2020; Rönnberg et al., 2019). Thus, less reconstructive postdiction processing is needed to disambiguate the speech signal. In line with this reasoning, Ng et al. (2014) demonstrated that after a period of up to 6 months with new hearing aid settings, initial associations with WMC during speech recognition in noise seemed to vanish. However, the original ELU model did not appropriately cover such developmental effects.

    Therefore, in the context of sign language imitation, Holmer et al. (2016a) proposed the Developmental ELU (D-ELU) model to account for the importance of pre-existing cognitive representations that influences further development of new representations. The model assumes that mismatch-induced postdictive processes push the system toward appropriate adjustment of SLTM and that the formation of novel representations is supported and, at the same time, constrained by existing representations in the lexicon (Holmer & Rudner, 2020). The notion that WM dependence for hearing aid users becomes weaker over time (Ng et al., 2014) is thus in line with the D-ELU, which proposes that new representations are formed that are adapted for changed hearing conditions.

    However, in a recent paper by Ng and Rönnberg (2019), we have been able to show that especially for speech maskers (4T) and for mild-to-moderate hearing impairment, there can be a much more prolonged (up to 10 years) WM dependence. This may imply that in some conditions, it is impossible to acclimatize to masking of target stimuli when speech distractors are dynamic and hard to form effective representations for. Interestingly, Han et al. (2019) reported worse word learning and weaker influence of existing representations on learning in the context of broadband white noise, suggesting that the mechanism proposed by the D-ELU is disrupted by noise. Particular combinations of speech and speech distractors may amplify mismatch between phonological representations and the semantic content of a person's SLTM. The SLTM component of speech maskers will always interfere with processing of target sentences to some degree (Ng & Rönnberg, 2019). It may actually be the general case that increased WM dependence becomes an everyday rule rather than an exception, even with advanced signal processing in the hearing aid. This, obviously, has clinical implications in the sense that the hearing impairment exacerbates effort and fatigue during the day, as well as in the long term (Rudner et al., 2011). We, of course, note that we are always more or less dependent on WM for language processing in real-life discourse, but what is meant here is the extra load on WM that may still come about in complicated RAMBPHO-LTM interactions.

    Finally, recent studies on children show that WMC and vocabulary (i.e., SLTM) constitute important cognitive predictors when listening to speech in adverse conditions (Walker et al., 2019). This applies to children with normal hearing as well as children with hearing impairment (McCreery et al., 2019) and other hearing difficulties (Torkildsen et al., 2019). Furthermore, WMC predicts language development in children with hearing impairment, whereas vocabulary predicts reading comprehension in this group (Wass et al., 2019). This only goes to show that the ELU model has a certain amount of generality across the life span and the importance of vocabulary development gives support to D-ELU (Holmer et al., 2016a).

    Prediction

    So far, we have discussed the postdictive aspect of WM in the ELU model and its capacity to support reconstruction of misperceived information. However, as we have emphasized in Rönnberg et al. (2013, 2019), WM is also involved in the pretuning of the cognitive system and priming of to-be-understood sentences, albeit in a different functional role that demands less elaborative processing but is purposively related to identification and detection of targets. Examples of pretuning or priming do not necessarily build on explicit and elaborative processes. We have shown correlations with WM in different paradigms that rely on cognitive processes operating prior to actual stimulus presentations, for example, in repetition priming paradigms (Signoret & Rudner, 2019), or in cued sentence perception (semantically matching vs. mismatching cues of upcoming target sentences; Zekveld et al., 2011, 2012, 2013). A further remarkable example is when WM load in a visual letter-based n-back task seems to dampen the postcochlear, olivary complex responses (Wave V5) to tones in an odd-ball paradigm (Sörqvist et al., 2012; see also Kraus & White-Schwoch, 2015; Lehmann & Skoe, 2015; Molloy et al., 2015). Dampening of the W5 response was even further reinforced by the WMC of the participant, obviously having an early, top-down, inhibitory effect on brainstem processing and attention.

    Thus, this last example represents resource allocation due to a clearly explicit involvement of WM, whereas in the two preceding examples, information is kept in mind in a way that could be either explicit or implicit depending on the participants' task strategy. In a more recent experiment, building on Sörqvist et al. (2012), we employed the same visual n-back—auditory odd-ball paradigm—and predictably so, inhibition and dampening of cortical activity of the superior temporal lobe was observed especially with high WM load (Sörqvist et al., 2016; see also Rosemann & Thiel, 2018; Sharma & Glick, 2016).

    Our demonstrations cited above show that while some items will be held explicitly in WM (prestimulus presentation), other paradigms use an implicit side of WM when it comes to prediction (e.g., Davis et al., 2005). This implies that the explicit/implicit distinction is not as crucial for predictive—compared to postdictive—processes as we previously had assumed.

    APPROACH II: ARHL and Long-Term Interactions Among Memory Systems

    A second approach focuses on the long-term effects of age-related hearing loss (ARHL). The ELU model builds on a memory systems view of the long-term consequences of hearing impairment, unlike a common cause account (e.g., Baltes & Lindenberger, 1997; Humes et al., 2013), which assumes some common neural degeneration that is responsible for a general cognitive decline. The ELU model takes sides with a view that assumes that hearing loss may cause cognitive decline (Rönnberg et al., 2014) and that gray matter volume is proportional to audiometric hearing loss, which in turn is correlated with brain activity during sentence comprehension (Peelle et al., 2011; Peelle & Wingfield, 2016; Rudner et al., 2019). Lin (2011) and Lin et al. (2011, 2014) show that over times, it may be the case that the hearing loss is driving brain atrophy, which in turn will undermine cognitive integrity and ultimately lead to cognitive decline and dementia (Livingston et al., 2017). Indeed, we showed that even subclinical levels of poorer hearing in a middle-aged population are associated with smaller brain volumes in auditory and cognitive processing regions of the brain (Rudner et al., 2019). An even more recent study by Ayasse et al. (2019) shows that grammatical complexity is enough to tax the resources of participants with very small (within “normal”) hearing impairments.

    Although not proving causality, independent data from the Betula database (Nilsson et al., 1997; Rönnberg et al., 2011; n = 160 hearing aid wearers) employing structural equation modelling (SEM), we obtain satisfactory model fits and significant links between variables only for hearing loss, not for vision loss (Rönnberg et al., 2011). Note also that using hearing-impaired participants who wear hearing aids is a conservative test of the hypothesis. Nevertheless, the hearing loss effect is manifested for two memory systems: ELTM and SLTM, not short-term memory, or WM. Finally, included in the latent construct of ELTM, we used three different tasks: oral recall of auditory presented word list (hearing aids on), oral recall of textually/auditorily presented sentences, and oral recall of motorically executed imperatives like “comb your hair” or “tie your shoe laces.” All ELTM tests (free verbal recall but with different encoding instructions) were affected by hearing loss, and if anything, the highest simple correlation with hearing loss was the motorically encoded imperatives, also called Subject Performed Tasks. This tells us that representations in ELTM must be multimodal (as is RAMBPHO) and that impaired hearing can drive such counterintuitive results as the SPT data (e.g., Rugg et al., 2015).

    This kind of pattern of results, that is, selectivity for memory systems and sensory modality/generality for encoding instruction, is not easily accounted for by a common cause model. However, since the study was cross-sectional, we still cannot be sure about causality. Nevertheless, it is in line with the ELU prediction of disuse of a multimodal ELTM system that it is driven by hearing loss. Furthermore, the data suggest that the memory system selectivity we observed is neither due to information degradation (e.g., Schneider et al., 2002) nor to consumption of attention due to a degraded auditory stimulus (e.g., Verhaegen et al., 2014) because then the auditory-only encoding would have suffered more relative to the other encoding conditions (Rönnberg et al., 2011). The effect is presumably not dependent on changes in the phonological/lexical structures of SLTM, because the hearing loss memory system selectivity remained after partialling out for age in the SEM models (Rönnberg et al., 2011). SLTM structures like phonological neighborhoods would be expected to deteriorate with age (Neighborhood Activation Model; Luce & Pisoni, 1998; Sommers, 1996), but, again, the selective hearing loss effect on memory systems survived in the SEM analysis as a significant predictor variable (Rönnberg et al., 2011).

    Furthermore, it was also the case in Rönnberg et al. (2014), using a very large sample of participants from the UK Biobank, that hearing loss was related to visual ELTM, yielding an effect size in the moderate range. This is a further argument with respect to the multimodality issue. Finally, a further study by Armstrong et al. (2020), using data from the Baltimore Longitudinal Study of Aging, demonstrated that hearing thresholds 2 years prior to testing predicted performance on the California Verbal Learning test (delayed audio-verbal recall), which is still another suggestion of a possible causal relationship between ARHL and ELTM.

    However, that is the overall picture, and the more specific underlying mechanism as to why hearing loss may lead to dementia is still unclear (Hewitt, 2017; Livingston et al., 2017; Roberts & Allen, 2016; Wayne & Johnsrude, 2015). When it comes to functional ELU mechanisms, brain plasticity, and the behavioral antecedents of the effects of hearing loss, we advocate the following rationale: a relative use/disuse of memory systems view. In this context, this view claims that postdiction/reconstruction of misunderstood or misheard words is heavily dependent on daily WM activity (Rönnberg et al., 2014). WM is involved in reconstruction and repair of misheard words or sentences, day in, day out. However, even though WM is used many times during postdiction, WM reconstruction will not always be successful. Unsuccessful reconstruction will reduce the number of times that events, communicated words, and meanings can be encoded into ELTM. This will, as a consequence, reduce the number of times that ELTM will be used; the number of encodings of episodic traces and subsequent retrievals will be blocked or reduced when WM is in error. Therefore, ELTM with ARHL will deteriorate faster than ELTM without ARHL because of less usage and practice.

    While WM is reconstructing and inferring knowledge from the bits and pieces of information decoded and stored for processing, SLTM will be called on for providing, for example, phonological constraints, meanings, and word knowledge to narrow down the intelligent guesswork that is needed to postdictively reconstruct what was said, for example, in the gating paradigm (Moradi et al. 2017, 2019, 2014). This means that WM will be compared to SLTM—in addition to retrieving stored representations of phonology, semantic meanings of words, knowledge of grammar, and objects—have to use the perceived items currently in mind, and combine those with the SLTM contributions retrieved online. Thus, WM has this inherent dual purpose of combining and inferring, while SLTM provides semantic support, and ELTM encoding is a consequence of the WM-SLTM interaction. So, on a use–disuse dimension, the general prediction is that the degree of deterioration of memory systems due to hearing loss is as follows, starting with the highest degree: ELTM > SLTM > WM.

    One caveat here is about how old the person is and how well developed the cognitive system and its SLTM representations actually are. A mature and richly interconnected semantic representational system will probably interact with WM more and be less disused; a rich ELTM will probably have more SLTM connections in its memory representations, and so the use–disuse rank order of memory systems may be affected. However, in general, we submit that the order of memory systems decline is the one suggested above.

    It is important to note that the basic prediction of the ELU model regarding the effect of hearing loss on memory systems is not dependent on the encoding modality per se but rather on the memory system as such. It seems to be the particular memory system that suffers, independent of encoding modality (e.g., motor, visual, or auditory). This has made us conclude that hearing impairment affects multimodal memory systems, not just single modality-specific, for example, auditory or visual short-term memory systems.

    These results are illustrated by the recent studies on the effects of ARHL (see Rönnberg et al., 2011, 2019, 2014, 2013). While ELTM is the most fragile memory system, being susceptible to brain damage of different kinds, it is also the most advanced system and is the latest memory system to mature in ontogeny of a person (Tulving, 1983, 1985). It seems that memory systems obey the principle of last in–first out. We know that ELTM deficits may be indicative of mild cognitive impairment (e.g., Farias et al., 2017; Fortunato et al., 2016), sooner or later leading to dementia (Gallagher & Koh, 2011). We also know that hearing impairment increases the risk of dementia over a period of around 10 years (Lin et al., 2011, 2014; Livingston et al., 2017).

    In addition, it seems to be the case that the source of, for example, Alzheimer's disease is connected to encoding problems generally (which of course affects the number of retrievals per day), with negative effects for LTM systems (Stamate et al., 2020). This would be in line with the ELU disuse prediction of LTM systems. It is also possible that at least some of the patients had hearing impairments, although the authors stated that the profoundly hearing-impaired patients were excluded. As we have seen, only minor impairments can lead to effects on memory and comprehension. This is of course speculative but would fit with the overall argument that hearing loss drives problems with encoding and subsequent retrievals from ELTM and SLTM systems, hence resulting in disused LTM systems.

    If we combine these facts—and to assess a more stringent causality of the prediction—then one must conduct a study where we employ hearing impairment variables, cognitive and memory systems variables, as well as outcome variables. To be able to do this, and to causally model the impairment–cognition–outcome relationships, the study must also be a longitudinal one, with a sufficient number of years between test occasions and a sufficient number of participants enrolled. This is exactly what we are doing in Linköping right now, in the so-called n200 study (see Rönnberg et al., 2016, for a description of the test battery).

    Some Final Comments About Future Research and Clinical Applications

    1. We know by now that different kinds of signal processing in hearing instruments tax the cognitive system in different ways (e.g., Arehart et al., 2013, 2015; Foo et al., 2007; Lunner et al., 2009; Rudner et al., 2009; Zekveld et al., 2013). Some individuals gain less than they win from more advanced signal processing algorithms. Typically, individuals with high WMC tolerate and benefit from that kind of signal processing (Lunner et al., 2009; Lunner & Sundewall-Thorén, 2007). What is relatively new is that even for high positive signal-to-noise ratios, WMC modulates short-term retention of spoken materials, operationalized by, for example, the SWIR test (Ng et al., 2013; Souza et al., 2015, 2019). Future research should focus on task difficulty of the SWIR (see Micula et al., 2020) to investigate under what conditions signal processing off-loads WM. It is interesting to know in this context that even “within normal” levels of hearing acuity, cognitive and speech understanding performance may be negatively affected by small upward shifts in hearing thresholds, for example, comprehension of syntactically complex sentences will be exacerbated by such slight hearing losses (Ayasse et al., 2019; cf. Rudner et al., 2019).

    2. As we have shown, WM in a visual n-back task dampens “early” subcortical “attention” mechanisms as a function of WM load, as early as at the brainstem level (Sörqvist et al., 2012; cf. Kraus & Chandrasekaran, 2010; Kraus et al., 2012). This illustrates the power of cognitive hearing. In the same vein, we have also demonstrated that attention at the cortical level, with the same n-back task, inhibits auditory temporal lobe functions, hence reflecting the brainstem data (Sörqvist et al., 2016). A further future experiment that builds on Sörqvist et al. (2016) would be to test for a double dissociation by having participants engage in an auditory n-back task, while as the visuospatial task one could, for example, count the number of capital (deviant) letters in a stream of presented letters, one by one, analogous to counting of deviant tones. The prediction would be a dampening of occipito-parietal parts of the brain. In both cases it is easy to imagine, for example, traffic situations that would be dangerous if one of the senses were to be dampened by cross-modal loading of the other cognitive sense modality.

    3. Current attention research is investigating the possibilities of steering signal processing of hearing aids through capture of the electroencephalogram signals, not least through eye movements (i.e., watching the speaker you are talking to; see Alickovic et al., 2019). The basic problem is to how to decode appropriate signals from the brain while attending to a speaker. Future research aims at capturing different listening intentions with electroencephalogram signals (e.g., listening to comprehend, listening to remember, or selective listening for a particular cue).

    4. In the original ELU model, no mechanism was assumed that could account for developing new SLTM representations. The D-ELU (Holmer et al., 2016a) is a step in the direction of accounting for developmental effects, but several questions remain. Future experimental research will focus on what can be represented and what cannot in SLTM. Which dynamic auditory events will be hard to develop new representations for (Ng & Rönnberg, 2019), and which are possible to develop? In addition, how do preexisting representations, contextual factors, and WMC interact in the establishment of new representations?

    5. One major goal of our WM research is to develop a test that is predictive of the capacity to deal with online, adverse listening conditions, and at the same time measures the capacity to optimize transfer to ELTM. We submit that ELTM relating to the contents of a conversation held in noise is an important clinical marker of being able to focus on the contents of the conversation “here and now.”

    Many other topics remain like, for example, a longitudinal study of the relationship between hearing parameters, cognitive abilities and different kinds of outcome variables, as well as WM intervention studies. These and other topics will be presented in papers to come.

    Acknowledgments

    This research was supported by Grant 2017-06092 from the Swedish Research Council as well as by the Linnaeus Centre HEAD, financed by the Swedish Research Council, awarded to Jerker Rönnberg as PI, which also funded the research reported for the Humes Keynote Opening Lecture, at the Aging and Speech Communication 2019 Conference, Tampa, Florida, and by FORTE: Swedish Research Council for Health, Working Life, and Welfare.

    References

    Author Notes

    Disclosure: The authors have declared that no competing interests existed at the time of publication.

    Correspondence to Jerker Rönnberg:

    Editor-in-Chief: Frederick (Erick) Gallun

    Editor: David A. Eddins

    Publisher Note: This article is part of the Forum: Select Papers From the 8th Aging and Speech Communication Conference.

    Additional Resources