Statistical Learning in Specific Language Impairment: A Meta-Analysis

Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.

N atural languages are structured at the level of sound (phonology), word formation (morphology), and sentence (syntax). These structures are reflected by statistical regularities in speakers' verbal output. Children learning their native language unconsciously detect and extract these regularities (Romberg & Saffran, 2010). This process, called statistical learning, is thought to be fundamental for the earliest stages of language acquisition (Evans, Saffran, & Robe-Torres, 2009). Two types of statistical learning are generally distinguished: distributional statistical learning and sequential statistical learning. Distributional statistical learning is about the detection of frequencies with which certain linguistic elements or structures occur. Sequential statistical learning concerns the detection of the sequential ordering and co-occurrence of concrete elements (e.g., syllables) in the auditory input in time (Kerkhoff, Bree, & Wijnen, submitted).
This meta-analysis focuses on sequential statistical learning, and therefore, from here onward, the term statistical learning refers to sequential but not distributional statistical learning.
Individual performance on statistical learning tasks has been shown to predict sentence comprehension (Misyak & Christiansen, 2012), processing of relative clause sentences with long-distance dependencies, and lexical and oral language skills in participants' native language (Evans et al., 2009;Mainela-Arnold & Evans, 2014). Because tracking statistical patterns appears crucial for language acquisition and people differ in their ability to do this, it is not surprising that deficits in the ability to detect statistical patterns and relations in the input have been put forward as an explanation for impairments of language acquisition, notably specific language impairment (SLI; Evans et al., 2009;Hsu & Bishop, 2011;Ullman & Pierpont, 2005). A considerable number of studies looked at the domain specificity of this type of learning deficit in SLI. A recent meta-analysis by Obeid, Brooks, Powers, Gillespie-Lynch, and Lum (2016) summarized these findings and concluded that people with SLI perform worse on statistical learning tasks compared with typically developed people but that this difference in performance did not vary as a function of task modality (visual; visual motor and auditory modality) or age. The current meta-analysis provides a more extensive quantitative investigation of the difference in statistical learning ability between people with and without SLI 1 in the auditory domain. Different from Obeid et al. (2016), our focus is on the auditory linguistic domain. Specifically, we were interested to see whether a difference in statistical learning performance between people with and without SLI varied as a function of linguistic level (word segmentation vs. grammar) or age at which learning took place.

Statistical Learning in the Laboratory
Many experimental studies of statistical learning focus on learning dependencies. These dependencies can be learned at different linguistic levels (word segmentation vs. grammar). We first discuss examples of artificial word segmentation studies followed by examples of artificial grammar learning studies.
In experiments that simulate word segmentation, participants are exposed to a continuous stream of syllables that are organized according to a set of statistical regularities. The stimuli are designed in such a way that transitional probabilities of sequences of certain adjacent syllables are higher than transitional probabilities of other adjacent syllables (continuous relationship), reflecting word boundaries (Saffran, Newport, Aslin, Tunick, & Barrueco, 1997). After exposure, participants perform a lexical decision task (or word recognition via a preferential looking paradigm in the case of infant studies) in which they hear sequences of syllables that had high transitional probabilities in the exposure phase (reflecting words) as well as sequences of syllables that had low to zero transitional probabilities in the exposure phase. Accordingly, adult participants have to indicate whether the words they are presented with are part of the language they were familiarized with or not. In infant studies, the listening times to the sequences of syllables with high transitional probabilities versus sequences of syllables with low transitional probabilities are compared. Results show that adults and infants are able to distinguish such artificial high-probability words from artificial low-probability words on the basis of adjacent transitional probabilities.
Contrary to word segmentation studies, the stimuli in artificial grammar learning studies consist of already segmented words that have primary stress and minimal coarticulation and are separated by pauses. Artificial grammar learning studies aim to resemble grammatical phenomena present in natural language. In natural language, for example, grammatical relations are present among functional elements (e.g., is and -ing) across interleaved lexical elements (e.g., Grandma is singing; example taken from Sandoval & Gómez, 2013). In experimental designs that test this type of learning, participants are exposed to strings generated, unknown to the learner, by a miniature artificial grammar. The grammar follows a set of nonadjacent (discontinuous) dependency relations (Gómez, 2002), a set of predictive relations (cf. Saffran, 2002), or a set of finite rules (finite state grammar; Gómez & Gerken, 1999). The procedure of artificial grammar learning designs is similar to the procedure in word segmentation studies: After a period of exposure to the language, participants are tested with strings that either conform to the grammar (grammatical items) or violate the grammar (ungrammatical items), and participants have to indicate whether the string they hear is grammatical or ungrammatical. More important, participants are asked to judge strings with elements that they have heard during the familiarization phase of the experiment as well as strings with novel elements that they have not heard before to test for generalization of the rule (although not all artificial grammar learning studies test for generalization; see Grama, Kerkhoff, & Wijnen, 2016).

Cognitive Processes Involved in Auditory Verbal Statistical Learning
As stated in our operational definition, statistical learning requires sensitivity to regularities in the input (e.g., statistical cues like transitional probabilities in word segmentation and [non]adjacent dependencies in artificial grammar learning). However, there are also other cognitive processes involved in statistical learning in the auditory linguistic domain such as phonological awareness (the ability to analyze and manipulate incoming phonemes and syllables), verbal short-term memory, and verbal working memory. Both word segmentation and artificial grammar learning involve the temporary storage of incoming input, which is necessary to pick up the statistical regularities between elements in the input (verbal short-term memory). In addition, artificial grammar learning, compared with word segmentation, requires processing of long-distance dependencies and generalizing those dependencies to novel items. Long-distance dependencies have been argued to put more demand on working memory than adjacent dependencies (see, e.g., theoretical models on resource limitation of Gibson [1998]), and generalization is more demanding than recognition of items previously introduced (Thompson & Newport, 2007). Therefore, we hypothesize that artificial grammar learning, compared with word segmentation, is more demanding on working memory capacity. In the following section, we discuss how this difference between both levels of learning might disadvantage individuals with SLI in their auditory verbal statistical learning performance.

Statistical Learning in SLI
In natural language, SLI is characterized by problems at the grammatical level (e.g., subject-verb agreement, past-tense marking; Leonard, 2014) as well as at the word segmentation level (e.g., lexical-phonological deficits observed in gating and nonword repetition tasks; see Mainela-Arnold, Evans, & Coady, 2010;Graf Estes, Evans, & 1 When we speak of people without SLI, we mean people who are matched in age and/or IQ to participants with SLI (see Table 1), who have no reported (history of) hearing, language, or learning problems and no reported (history of) neurological impairment or illness.

Lammertink et al.:
A Meta-Analysis of Statistical Learning in SLI 3475 Else-Quest, 2007). In artificial language, we see a similar pattern: Most studies investigating auditory verbal statistical learning in SLI show that participants without SLI outperform participants with SLI both in word segmentation and in grammar learning tasks (word segmentation: Evans et al., 2009;grammar: Hsu, Tomblin, & Christiansen, 2014;Lukács & Kemény, 2014;Mainela-Arnold & Evans, 2014;Mayor-Dubois, Zesiger, Van der Linden, & Roulet-Perez, 2014). It is known that people with SLI exhibit deficits in verbal short-term memory and verbal working memory as well (Archibald & Gathercole, 2006;Marton, Eichorn, Campanelli, & Zakariás, 2016;Montgomery, 2003). As these processes are involved in auditory statistical learning, it might well be the case that these deficits influence the auditory verbal statistical learning abilities of people with SLI. Previous research, however, suggests that memory problems cannot solely explain auditory statistical learning problems. For example, individuals with SLI have problems with statistical learning in the nonverbal domain (Lum, Conti-Ramsden, Morgan, & Ullman, 2014;Lum, Conti-Ramsden, Page, & Ullman, 2012;Obeid et al., 2016), which are unlikely to be caused by verbal short-term and working memory problems. In addition, Hsu and Bishop (2014) report poor verbal sequence learning in children with SLI, even after controlling for limitations of verbal short-term memory (Hebb repetition task). Taken together, results of previous studies are congruent with the hypothesis that SLI is associated with a "statistical learning disadvantage." The magnitude and moderators of this disadvantage, however, are unknown. Therefore, the primary purpose of the current meta-analysis was to assess the magnitude of this statistical learning disadvantage in the auditory verbal domain. The second goal was to explore the potential impact of linguistic level and age at which learning takes place. We wanted to explore whether the statistical learning disadvantage is more severe in artificial grammar learning than word segmentation studies, as the former type of learning is more demanding on verbal working memory capacity, which is generally affected in SLI. With the second meta-regression, we explore whether age moderates the statistical learning disadvantage. Previous studies investigating the influence of age in statistical learning have provided mixed results. Obeid and colleagues (2016) reported no effect of age on statistical learning differences between people with and without SLI across different modalities of learning. Lum and colleagues (2014), however, reported smaller differences in visuospatial statistical learning performance between people with and without SLI for older compared with younger participants. Likewise, studies investigating the developmental trajectory of statistical learning in typically developing people have reported mixed results. Some studies report that there is no evidence for a difference in statistical learning performance between adults and children (visual domain: Kirkham, Slemmer, & Johnson, 2002;auditory domain: Saffran et al., 1997), whereas others do report that statistical learning performance improves with age (visual domain: Arciuli & Simpson, 2011;auditory domain: Lukács & Kemény, 2015;visuospatial: Meulemans, Van der Linden, & Perruchet, 1998).

The Current Study
The current meta-analysis provides an estimate of the magnitude of the statistical learning disadvantage in people with SLI by means of a quantitative overview of both published and unpublished studies that investigate statistical learning in the auditory linguistic domain in people with and without SLI. In a first step, we calculated the standardized averaged mean difference (effect size measure) in performance on statistical learning tasks in people with and without SLI. In a second analysis, we explored whether the effect size measure was moderated by linguistic level (word segmentation vs. grammar) and age.

Method
We used the Preferred Reporting Items for Systematic Reviews and Meta-Analysis statement to organize the current meta-analysis (Moher, Liberati, Tetzlaff, Altman, & The PRISMA Group, 2009). Effect size calculations were done in the statistical software R (R Core Team, 2016): Formulas were implemented via the R compute.es package (Del Re, 2013), and statistical analyses on the effect size measures were conducted with the R meta (Schwarzer, 2015) and metafor (Viechtbauer, 2010) packages.

Literature Search
Systematic searches for empirical articles were conducted in February 2016 using a combination of prespecified key word combinations (details of all key words, Boolean operators, and syntax used for each database can be found in Supplemental Material S1). We conducted our searches in five different sources including PubMed, Education Resources Information Center, PsycINFO, Linguistics and Language Behavior Abstracts, and Open Access Theses and Dissertations. In addition, we asked experts in the field to inform us of any published or unpublished studies via two different calls (LINGUIST List and Cogdevsoc list; July 2016). These combined searches yielded 161 articles (PubMed: 26 hits, Education Resources Information Center: 25 hits, PsycINFO: 64 hits, Linguistics and Language Behavior Abstracts: 38 hits, Open Access Theses and Dissertations: five hits, and experts in the field: three hits).

Inclusion Criteria and Study Selection
To be included in the meta-analysis, studies were required to meet the following criteria: (a) A study should report on original empirical research data. Both published and unpublished studies were eligible, including articles in refereed journals, nonrefereed journals, dissertations, and conference presentations; (b) a study should have an experimental design that tests sequential statistical learning in the auditory verbal domain assessed via a word segmentation, grammaticality judgment, or related task; (c) as we aimed to test whether participants implicitly detected the statistical regularity, participants should not receive any explicit instruction or feedback regarding the underlying structure of the artificial language to be learned or on their behavior during the training or test phase; and (d) selected studies include one group of participants with SLI and one group of age-matched controls who do not have language impairments. More important, we only included studies that identified participants with SLI on the basis of inclusionary and exclusionary criteria typical for SLI. Therefore, studies had to report scores on standardized language tests 2 or use a test battery that differentiates between participants with and without a history of SLI (e.g., Tomblin battery; Tomblin, Freese, & Records, 1992; see Table 1). In addition, a nonverbal IQ measure 3 and no history of neurological or emotional delays should be reported for both participant groups. It is important to mention that the inclusion and exclusion criteria for SLI vary across the studies in our sample (see Table 1). We only included studies, however, that based their inclusion criteria on both standardized language tests and IQ scores. If studies failed to report on one of these criteria (or if information on these criteria could not be confirmed via contact with the authors), the study was excluded from the analysis. In addition, when studies included children with an IQ below 80, the control group and the group with SLI had to be matched on nonverbal IQ to ensure that differences in statistical learning performance are not the result of lower IQ scores. Finally, to be included in the analysis for the current article, studies had to be conducted before September 2016 (but see footnote 4). However, as our database is accessible online and open to update, future studies can be added, which facilitates the accumulation and evaluation of previous and future studies on statistical learning in this domain (Tsuji, Bergmann, & Cristia, 2014). No start date for publications was set to find as many studies as possible. For an overview of the exact inclusionary and exclusionary criteria for the studies in our final sample, see Table 1.
After removing duplicates, 81 studies (78 published articles and three unpublished conference posters) remained. Two reviewers independently conducted the study selection procedure. In a first step, both reviewers performed a full-text inspection of the 19 studies (16 published articles and three nonpublished conference posters) that were selected, based on screening of the title and abstract. The reviewers independently screened these full-text articles and posters according to the inclusion criteria. There was 95% (18/19 studies) agreement on the selection of these full-text studies (eight studies included, 10 studies excluded, one study for discussion). After discussion, the reviewers decided not to include the one study they had disagreed on because participants in this study had received feedback on their behavior during the test phase (Torkildsen, Dailey, Aguilar, Gómez, & Plante, 2013). As a result, the initial final selection consisted of eight studies (five published articles and three nonpublished conference posters). 4 For a visual representation of the literature search procedure, see Figure 1.
Four of the eight studies reported multiple individual experiments or multiple outcomes per participant group (Evans et al., 2009;Grunow, Spaulding, Gómez, & Plante, 2006;Hsu & Bishop, 2014;Torkildsen, 2010). If the data necessary to compute the individual effect size were available for each experiment separately and the groups of participants tested in the experiments were independent (i.e., different participants), all of the experiments of that study were included in the meta-analysis. Only the study of Hsu et al. (2014) met these criteria. For the other three studies with multiple experiments (Evans et al., 2009;Grunow et al., 2006;Torkildsen, 2010), only one effect size measure was incorporated into the final analysis (for more details on our decisions with respect to this part, see the subsection Effect Size Calculation). This resulted in a final sample of 10 experiments.

Sample Description
The eight studies (10 experiments) we included in our analysis were published (six studies) or presented (two studies) between 2006 and 2017 (see footnote 4). The experiments collectively examined 213 participants with SLI and 363 controls, all between 6 and 19 years old. The dependent variable was slightly different across the 10 experiments. In six experiments, the outcome variable was the overall accuracy scores on a grammaticality judgment task; in three experiments, the outcome variable was the overall accuracy score on a word segmentation task; and in one experiment, the outcome variable was an eventrelated potential (ERP: P600).

Effect Size Calculation
For each individual experiment, we calculated the effect size (Hedges' g) as the standardized mean difference (SMD) 5 in performance between the participants with and without SLI. The SMD was chosen over the raw mean difference, because the dependent variables differed across studies (ERP amplitude vs. grammaticality judgments).
All formulas used to calculate the SMD and the approximation of the variance of the SMD for each individual experiment are shown in Figure 2 and were taken from the R compute.es package (Del Re, 2013). The effect size was calculated so that positive values indicated that 2 Participants with SLI scored at least 1.25 standard deviations below age norms. 3 Nonverbal IQ had to fall within the normal range (> 80), or when the lower limit of IQ was < 80, the control group and the group with SLI had to be matched on nonverbal IQ.  (Grunow et al., 2006;Torkildsen, 2010), the SMD was calculated from the reported F statistic on the main effect of group ( fes function from the compute.es package), and for one experiment (Mayor-Dubois et al., 2014), the reported t statistic was used to calculate the SMD (tes function in the compute.es package). As mentioned in the Inclusion Criteria and Study Selection section, it was not always possible to calculate multiple effect sizes for studies that ran multiple experiments. In the case of the Grunow et al. (2006) experiments, we calculated one effect size because the statistical information necessary to calculate a separate effect size for each of the different experimental conditions (low vs. high intervening X-category, generalization vs. nongeneralization items) was not available. Likewise, one effect size was obtained from the study by Evans at al. (2009), which reported on two different experiments. The second experiment was conducted 6 months after the first. The participants of Experiment 2, however, had all participated in Experiment 1, rendering the data of the second experiment correlated with a part of the data of the first experiment. A combined effect size, taking the correlation term between Experiments 1 and 2 into account, would have been the ideal solution because it would take into account the increased precision of within-subject measures (Borenstein, Hedges, Higgins, & Rothstein, 2009, pp. 28-30). However, it was impossible to determine the correlation term between the two experiments, because only parts of the data were correlated. Therefore, we included only the first experiment, which had twice as many participants as the second experiment. Last, Torkildsen (2010) recorded ERPs during both the exposure phase and the test phase. As we have no measures of performance during the exposure phase for the other studies in our sample, only the effect size measure of the ERPs recorded during the test phase is included.  (Semel et al., 1995); WISC-R = Wechsler Intelligence Scale for Children-Revised (Wechsler, 1974); WPPSI = Wechsler Preschool and Primary Scale of Intelligence (no version reported); CELF-R = Clinical Evaluation of Language Fundamentals-Revised (Semel et al., 1987); WISC-4 = Wechsler Intelligence Scale for Children-Fourth Edition (Wechsler, 2003); PPVT-4 = Peabody Picture Vocabulary Test-Fourth Edition (Dunn & Dunn, 2007); CELF-4 = Clinical Evaluation of Language Fundamentals-Fourth Edition (Semel et al., 2003); WISC-3 = Wechsler Intelligence Scale for Children-Third Edition (Wechsler, 1991); PPVT-R = Peabody Picture Vocabulary Test-Revised (Dunn & Dunn, 1981); CREVT = Comprehensive Receptive and Expressive Vocabulary Test: Adult (Wallace & Hammill, 1997); QRI-2 = Qualitative Reading Inventory-Second Edition (Leslie & Caldwell, 1995); RAVEN = Raven's Progressive Matrices and Raven's Coloured Matrices (Raven et al., 1987); PPVT = Peabody Picture Vocabulary Test (Csányi, 1974); TROG = Test for Reception of Grammar (Lukács et al., 2012); MAMUT = Magyar Mondatutánmondási Teszt (Hungarian Sentence Repetition Test; Kas & Lukács, 2011); NWR = nonword repetition task (Racsmány et al., 2005). a Brown et al. (1997). b Tomblin et al. (1992) Finally, we applied Hedges' g correction for small sample sizes to all 10 effect sizes, because most of the experiments had a sample size of less than 20 (Borenstein et al., 2009, p. 27).

Publication Bias
Meta-analyses are generally sensitive to publication bias. Publication bias reflects the tendency of a higher publication rate for studies with significant results compared with studies with nonsignificant results (Dickersin, 2005). Because it is more likely that published studies end up in a meta-analysis, the overall combined effect size might be overestimated when there is a publication bias in the sample used to compute the combined effect sizes (Borenstein et al., 2009, p. 278).
In the current meta-analysis, we analyzed funnel plot asymmetry as a potential indicator of publication bias (Egger, Smith, Schneider, & Minder, 1997). In our funnel plot (see Figure 3), the effect size of a particular experiment is plotted against the standard error of that particular experiment. The standard error can be interpreted as a measure of experiment size, as generally experiments with fewer participants have higher standard errors. In the absence of publication bias, a funnel plot is symmetric and funnel shaped; large experiments appear toward the top (low standard error) of the plot and generally cluster around the mean effect size, whereas smaller experiments appear toward the bottom (higher standard error) of the graph and tend to be spread across a broader range of values. Visual inspection of our funnel plot (see Figure 3) seems to suggest asymmetry such that smaller experiments tend to have greater effect sizes (i.e., they appear more to the right side of the mean effect size than the left side). The latter could indicate publication bias, as small experiments are more likely to be found (or published) when the effect size is large compared with when the effect size is small. We performed a linear regression on funnel plot asymmetry (Egger et al., 1997). The test on funnel plot asymmetry was performed using the regtest function in the metafor (Viechtbauer, 2010) R package. The regression on funnel plot asymmetry was not significant (z = 1.52, p = .13). Therefore, we have no statistical evidence for a publication bias in the current sample.

Primary Analysis: Effect Size and Heterogeneity
We estimated the average weighted SMD and heterogeneity of the sample with a random-effects model with the restricted maximum-likelihood estimator for the amount of heterogeneity. All 10 observed effect sizes and their weights were included to estimate the median effect size. No further moderator variables were specified in the model. Sample heterogeneity was assessed via Cochran's Q test for heterogeneity.
The overall weighted mean effect size and the observed effect sizes for the individual experiments are shown in Figure 4. The average observed weighted mean effect size (intercept) under our random-effects model (random effect = study) was 0.54 (SE = 0.09, 95% confidence interval [CI] [0.36, 0.70]). The observed effect size was significantly Figure 2. Formulae through which effect sizes (standardized mean difference/Hedges' g), variance (var), and weights for each individual study were calculated (Steps 1-4), and formula through which the average weighted standardized mean difference (SMD) was calculated (Step 5). Here, SLI refers to the values for the group with participants with specific language impairment (SLI). Control refers to the values for the control group. N signifies the number of participants in a given experiment. F is the reported F value for the main effect of group, and t is the reported t value for the between-groups effect.

Lammertink et al.:
A Meta-Analysis of Statistical Learning in SLI 3481 different from zero (z = 5.98, p < .001) and positive, which indicates that people without SLI, on average, outperform people with SLI on statistical learning tasks in the auditory verbal domain. In other words, the value of 0.54 can be regarded as our estimate for the statistical learning disadvantage in people with SLI. Furthermore, the CI ranges from 0.36 to 0.70, indicating that we reliably detected any effect size up to 0.36, which means that we can speak of a moderate-to-large statistical learning disadvantage in people with SLI.
As a measure of heterogeneity, the total amount of variance between the experiments was τ 2 = 0.0 (SE = 0.036). Cochran's Q test for heterogeneity was not significant (Q(9) = 10.11, p = .34). This means that there is no statistical evidence that the true effect sizes differ between the studies in our sample. It is important to note, however, that, whereas a significant Q test provides evidence that the true effects vary, a nonsignificant Q test alone should not be taken as evidence that the true effect sizes are consistent. The low number of experiments in our design could well explain the finding of nonsignificant heterogeneity (Borenstein et al., 2009, p. 113).

Secondary Analysis: Meta-Regressions on Linguistic Level and Age
As mentioned in the introduction, we were interested in seeing whether the linguistic level (word segmentation vs. grammar) and age at which the experiments were performed influence the SMD. We do realize, however, that our sample includes only 10 studies, which renders it unlikely that we will find a significant effect. Nevertheless, we decided to continue our meta-regression, as assessing the impact of the moderator variables linguistic level and age was part of our research question. As our moderator variables are correlated, the impact of both moderators is evaluated by means of two separate meta-regression models.
To assess whether the linguistic level at which the experiments were performed (word segmentation vs. grammatical structure) influences the SMD, we added linguistic level as a between-experiments moderator variable to the random-effects model described above. When we coded experiments at the word segmentation level as −0.5 and experiments at the grammatical level as 0.5, the resulting mixed-effects model detected no significant effect of linguistic level (estimate of the SMD difference = −0.15, SE = 0.18, z = −0.80, p = .43, 95% CI [−0.51, 0.21]).
As can be seen in Figure 4 (and Table 1), the studies in our sample included participants between 6 and 19 years old. To test for age effects, we fit a second meta-regression model with age in years (log-transformed) as the continuous predictor variable. The mixed-effects model detected no significant effect of age (estimate of the SMD difference = −0.10, SE = 0.11, z = −0.91, p = .36, 95% CI [−0.32, 0.12]).
In summary, we found no evidence that linguistic level or age influences the statistical learning disadvantage in people with SLI. 6 The potential effects of these moderators might be too small to detect with meta-regression due to the relative small number of studies in our sample.

Discussion
The primary purpose of our meta-analysis was to provide a quantitative overview of published and unpublished studies on auditory verbal statistical learning in SLI to evaluate the magnitude of the auditory verbal statistical learning disadvantage in people with SLI. We found that, on average, the detection of statistical regularities in the input was not as effective in people with SLI as in people without SLI (statistical learning disadvantage) and that this difference in performance was moderate to large. The results supplement the findings of Obeid et al. (2016) on statistical learning across different modalities in people with SLI. Different from Obeid and colleagues, our focus was on statistical learning in the auditory linguistic domain, which allowed us (a) to add five additional studies on statistical learning in this domain that were not included in the Obeid et al. study and (b) to further explore whether differences in statistical learning ability between people with and without SLI arise as a function of linguistic level. Following on the latter, the second goal of our meta-analysis was to 6 In addition, we conducted an exploratory meta-regression with the moderator variable adjacency type. This regression revealed no significant effects either. As one of our reviewers pointed out, however, a meta-regression with the moderator variable adjacency type is problematic, as adjacency type is highly correlated with linguistic level (i.e., all word segmentation studies feature an adjacent dependency learning paradigm, whereas the artificial grammar learning studies featured a mix of adjacent and nonadjacent dependency types). investigate whether the magnitude of statistical learning disadvantage in people with SLI was moderated by the linguistic level (word segmentation vs. grammar) or age at which learning takes place. We did not find evidence that the difference in statistical learning performance between people with and without SLI is moderated by the linguistic level and age at which learning takes place. Although the absence of the effect of linguistic level is a null effect and therefore difficult to interpret, it is in line with previous research reporting absences of associations between verbal working memory and sequence repetition learning (Hsu & Bishop, 2014;Lum et al., 2012). Alternatively, the potential influence of both moderators might have been too small to detect with our meta-regressions due to the relative small number of studies in our sample.
In all, our results extend previous findings on a visual statistical learning disadvantage in SLI (Lum et al., 2012(Lum et al., , 2014Obeid et al., 2016) to the auditory verbal domain and underline the assumption of a general cognitive deficit in the implicit detection of statistical regularities and/or dependencies in people with SLI that contributes to the language problems seen in this population (see also Evans et al., 2009;Hsu & Bishop, 2011, 2014Ullman & Pierpont, 2005).

Relevance for Clinicians Working With SLI
The current meta-analysis provides strong evidence that people with SLI have more difficulties with statistical learning than people without SLI. These findings support the use of evidence-based interventions that facilitate and stimulate the detection of (statistical) regularities in the input for people with SLI. A concrete example of such a statistical learning-based intervention is the conversation recast treatment for morpheme errors in children with SLI (Plante et al., 2014). Plante and colleagues base their training method on findings from artificial language studies. In such studies, strings have an aXb structure in which the a and b elements always co-occur (Gómez, 2002). It has been found that participants only learn the nonadjacent dependencies when the variability (i.e., different numbers of X elements) of the intervening X element is high enough (Gómez, 2002). Likewise, Plante et al. (2014) showed that children's use of trained morphemes improved for children who were trained on these morphemes in a high-variability context (24 verbs). They found no evidence of such a treatment effect for children in the low-variability (12 verbs) context. It thus seems that both people with and without language impairment benefit from variability and not only repetition in their language input (Plante et al., 2014). High variability facilitates rule learning rather than rote learning, as participants need to look for regularities and patterns in the input as soon as they notice that memorization is not an option in case of high variability (exceeding working memory capacity). These results suggest that clinicians working with children with SLI need to provide a great number of examples when explaining new rules.

Publication Bias
We would like to stress that, although the regression on funnel plot asymmetry did not reach significance, one should always be cautious for the possibility of publication bias in the literature on auditory statistical learning in SLI. Such a potential publication bias relates to the validity of the classical statistical learning paradigms to measure statistical learning efficiency. Recently, more and more researchers stress the importance of an online measure of statistical learning (e.g., Bogaerts, Franco, Favre, & Rey, 2016;Isbilin, McCauley, Kidd, & Christiansen, 2016;Misyak, Christiansen, & Tomblin, 2010) or a test phase that is more sensitive to individual variation. As mentioned by Siegelman, Bogaerts, and Frost (2016), a large proportion of the participants in a statistical learning study perform at chance level. On the group level, test performance is usually just above chance, and an accuracy score higher than 60% is rarely obtained. For these reasons, we consider it likely that more research groups have unpublished (pilot) data on auditory statistical learning in SLI that did not yield statistical significance. Inclusion of these unpublished data could have made our estimates more precise, and we therefore invite researchers who have such unpublished null results to contribute to our Community-Augmented Meta-Analysis via https://osf.io/4exbz/.

Recommendations for Future Studies
The results of the current meta-analysis show that there is a moderate-to-large statistical learning disadvantage in people with SLI. The moderators of this disadvantage, however, remain unknown. Therefore, we recommend that future studies test the effects of potential moderators such as linguistic level and age within a single study in which the variables are within-subject predictors. Longitudinal designs can be used to test statistical learning performance of the same participants but at different ages. Furthermore, we recommend the use of more sensitive and elaborate (e.g., online) measures of statistical learning at both the individual and group levels. For example, our meta-analysis included only one ERP study (Torkildsen, 2010). Interestingly, the difference between people with and without SLI in this particular study was relatively high (see Figure 4). Potentially, the ERP measure compared with the accuracy measure is more sensitive in picking up differences in performance between people with and without SLI. We recommend future studies to further investigate this potential difference in a within-subject design with results of both measurement types for each individual.

Conclusion
In conclusion, the result of our meta-analysis shows that there is a moderate-to-large statistical learning deficit in people with SLI. This result is congruent with the hypothesis that people with SLI are less effective in statistical learning in the auditory verbal domain than people without language impairment. These results motivate the development of statistical learning-based interventions for children with SLI. More studies are needed, however, to perform more fine-grained analyses on the determinants of statistical learning deficiencies in the auditory linguistic domain in people with SLI.