Open AccessSIG 1 Language Learning and EducationTutorial26 Feb 2019

Standardized Tests and the Diagnosis of Speech Sound Disorders

    • Leah Fabiano-Smith
    • , andDepartment of Speech, Language, and Hearing Sciences, The University of Arizona, Tucson



    The purpose of this tutorial is to provide speech-language pathologists with the knowledge and tools to (a) evaluate standardized tests of articulation and phonology and (b) utilize criterion-referenced approaches to assessment in the absence of psychometrically strong standardized tests.


    Relevant literature on psychometrics of standardized tests used to diagnose speech sound disorders in children is discussed. Norm-referenced and criterion-referenced approaches to assessment are reviewed, and a step-by-step guide to a criterion-referenced assessment is provided. Published criterion references are provided as a quick and easy resource guide for professionals.


    Few psychometrically strong standardized tests exist for the evaluation of speech sound disorders for monolingual and bilingual populations. The use of criterion-referenced testing is encouraged to avoid diagnostic pitfalls.


    Speech-language pathologists who increase their use of criterion-referenced measures and decrease their use of standardized tests will arrive at more accurate diagnoses of speech sound disorders.

    Over 30 years ago, McCauley and Swisher (1984) published an article reviewing the psychometric merit of standardized tests of articulation. Their conclusion was that speech-language pathologists (SLPs) are in the “uncomfortable position” of having less than perfect standardized tests to assess the speech skills of children suspected of having a speech sound disorder (SSD; p. 41). Fast forward 30 years to Kirk and Vigeland (2014), who published a review of more recent standardized tests of articulation and phonology, their results were not unlike those of their predecessors; the tests included in their review did not meet many of the psychometric properties required for the adequate identification of typical and disordered skills in children acquiring their phonological system. The authors cautioned clinicians that, although single-word, standardized tests of articulation and phonology have value in the assessment battery of young children, they should not be the only component of the clinical decision-making process. Although Flipsen and Ogiela (2015) found modest gains in recent years in the quality of single-word standardized tests of articulation, most tests continue to lack the full range of psychometric criteria, and many must still be supplemented with additional measures to be diagnostically acceptable. Where did these tests fall short? How do we decide what tests are best? In the absence of psychometrically sound standardized tests, what are our options for speech sound assessment? This article will discuss (a) current practices regarding standardized tests, (b) limitations of standardized tests, and (c) how to avoid overreliance on standardized tests and adopt criterion-referenced approaches.

    SLPs' Use of Standardized Tests: Current Practices

    In order to understand what sort of impact the widespread use of standardized tests has on accuracy of diagnosis, we must first discuss the limitations of standardized tests. It is important to point out that Kirk and Vigeland (2014) placed the impetus on the developers of standardized tests to increase the inclusion of various psychometric properties at the time of test creation. Tests of higher quality would be a great service to the clinicians in our profession and would reduce the number of diagnostic errors made by clinicians. It is important that the message taken away from this article is that most standardized tests do not provide the sensitivity (identification of disorder when disorder is present) and specificity (identification of typical when a child is indeed typical) required to act as the sole component of an evaluation. More to the point, SLPs who are aware of the weaknesses of standardized tests will be more likely to design an evaluation battery that results in an accurate diagnosis.

    How are we assessing SSD as a field? Skahan, Watson, and Lof (2007) published survey data on common assessment practices used by SLPs practicing in the United States. Just over 300 clinician responses were analyzed to observe how assessment protocols were selected for children with suspected SSDs. Clinicians who responded to this questionnaire had an average of 15 years' of experience in the field, ranging from 1 to 40 years, and worked in a variety of settings. More than 50% of respondents included the following procedures in their speech sound assessment protocol: (a) estimating intelligibility, (b) administration of a standardized single-word test, (c) a hearing screening, (d) stimulability testing, and (e) an oral motor examination (p. 249). Survey results indicated that the most widely used standardized test by SLPs was the Goldman-Fristoe Test of Articulation (GFTA, multiple editions; Goldman & Fristoe, 2000). It was reported by the majority of respondents that the GFTA is “always” used as part of their phonological assessment (p. 249). In addition, approximately 30% of respondents indicated using the Khan-Lewis Phonological Analysis (Khan & Lewis, 2002) as a supplement to the GFTA (p. 249). Williams and McLeod (2012) examined the phonological assessment practices of Australian SLPs. When considering bilingual and multilingual children on their caseloads, over 40% of SLPs always used a standardized test in English to assess multilingual children, regardless of language background (p. 298). Skahan et al. found a similar result for SLPs practicing in the United States; English-only standardized tests were used by 35% of SLPs when assessing bilingual children (p. 251). In order to understand what sort of impact the widespread use of these tests has on accuracy of diagnosis, we must first discuss psychometric properties of tests and the impact they have on a test's utility.

    What Do Standardized Tests Tell Us?

    According to Peña, Spaulding, and Plante (2006), norm-referenced, standardized tests provide the opportunity to compare the speech of one particular child to a “standard” (p. 247). We can observe if a child is performing like his peers (i.e., in the average range) or how many levels above or below his peers (i.e., standard deviations). If a child is performing below average, we can observe how far below his peers he is performing. The number of standard deviations below average, derived from a child's performance on the test, often designates whether or not a child receives speech therapy services. These interpretations, however, are only accurate if the test is well constructed. The proverbial saying, “Garbage in, garbage out” applies here; if a test is weak, the results of that test will provide an inaccurate representation of a child's speech sound abilities. This is a major challenge in our field because (a) the majority of standardized tests of articulation and phonology are psychometrically weak (Kirk & Vigeland, 2014) and (b) there is an overreliance on standardized tests by SLPs (Skahan et al., 2007). The goal of this article is to provide information on navigating standardized assessment manuals and how to identify tests that are well constructed. If strong standardized tests are unavailable, we will explore alternative approaches to assessment.

    The Psychometrics of Standardized Tests of Articulation and Phonology

    How do we know if a test is well constructed? McCauley and Strand (2008) and Kirk and Vigeland (2014) are both excellent resources for reviewing the parameters of standardized tests of articulation and phonological and oral motor (speech and nonspeech) abilities in children. Here, we will focus on three major psychometric properties of standardized tests: (a) the normative sample, (b) reliability, and (c) validity. For a more complete explanation of psychometric parameters, Hutchinson (1996) provides a comprehensive list and a thorough explanation of psychometric properties. In addition, detail on the standards for educational and psychological testing is regularly published by the American Educational Research Association (2014). Furthermore, a five-part webinar on test psychometrics for child speech is available for online learning (Pavleko, 2018). Because the focus of this tutorial is to focus on the core components a test must possess for accurate diagnosis of SSDs (in line with Kirk & Vigeland, 2014), we will focus on these three main criteria.

    The Normative Sample

    Plante and Vance (1994) indicated two major areas of importance when evaluating the normative sample of a test: the definition of the standardization population (including geographical location, socioeconomic status [SES], and the typical vs. disordered status of the children included) and that the normative sample includes at least 100 children in each subgroup. This information should be easily accessible in the test manual.

    If the goal of standardized testing is to differentiate between typical and disordered speech skills, the normative sample must be made up of only typically developing children (Peña et al., 2006). Including children with disorders in a normative sample shifts mean scores lower, causing underidentification of children with disorders. On the other hand, if you want to judge the severity of a particular disorder, a normative sample of individuals with that disorder is necessary. For example, if a clinician wants to compare a child with an already diagnosed phonological disorder to other children with phonological disorders in order to judge severity, the normative sample should consist of children with phonological disorders. This is not the case when the goal of testing is to diagnose the presence of a disorder; the normative sample must consist of only typically developing children to achieve that result. Many commonly used standardized tests of both speech and language include children with disorders in their normative samples, so this is certainly a psychometric characteristic that SLPs should be investigating first and foremost.

    In addition to the level of phonological ability that children have in the normative sample, there is a requisite number of children that need to be included in that sample for comparison. The minimum amount of children in each subgroup necessary for comparison is 100 (McCauley & Swisher, 1984). Fewer than 100 children in a subgroup allows too much variability in the sample, creating a less-than-reliable comparison between the child being tested and the children included in the sample. Therefore, in terms of the normative sample, the speech sound abilities of the children and the number of children in the normative sample are two main psychometric characteristics that should be clearly stated in the test manual.


    The second main area of focus in test psychometrics is reliability. According to McCauley and Swisher (1984), the reliability of a test is derived from how consistently a test measures a behavior. For example, if a child is evaluated on the same test item on more than one occasion (e.g., different times in the same day), the child should receive the same score on that item every time. Test–retest reliability, specifically, is the ability of a test to yield the same score when given more than once over short periods. Interexaminer reliability is the ability of more than one examiner to administer the same test to the same child and get the same result. Both test–retest reliability and interexaminer reliability should report a statistically significant correlation coefficient of .90 or higher to be psychometrically acceptable (McCauley & Strand, 2008). These psychometric attributes instill confidence that a test is consistent, over time, in its ability to measure phonological skills in child speech. Again, this information should be easily accessible in the test manual. Clinicians should be cautious in selecting a test that does not report reliability criteria.


    Our third area of psychometric focus is validity. Correlated with reliability, test validity refers to the ability of a test to measure what it says it measures. For example, if an SLP chooses the GFTA to assess articulation abilities in a preschooler, the SLP's question should be “Does the GFTA actually capture the articulation abilities of a child?” McCauley and Swisher (1984) break validity down into many specific categories, but here, we will focus on only those categories that are easy to measure. Psychometric properties that are easy to measure will be less prone to error or subjectivity.

    If a test has strong content validity, that test provides the SLP with accurate information on the ability being tested (p. 35). Criteria supporting the content validity of a test should include (a) a justification of the methods used to choose content, (b) expert evaluation of the test's content, and (c) an item analysis (McCauley & Strand, 2008). Content validity criteria are determined by the structure (e.g., number of members allowed in an onset cluster) and function (e.g., a sound's morphological status) of a language's phonology (Eisenberg & Hitchcock, 2010). These criteria are applied to ensure that enough content, or opportunities, for each aspect of speech sound production is being tested (Eisenberg & Hitchcock, 2010; Kirk & Vigeland, 2015). For example, all vowels must be targeted enough times to obtain an accurate representation of a child's abilities. Likewise, a test must include specific words that trigger all of the phonological error patterns common in a language. For a sound to be included in a child's phonetic inventory, at least two productions of that sound are required; therefore, a standardized test of articulation must provide at least two opportunities for production (Dinnsen, Chin, Elbert, & Powell, 1990). Weston, Shriberg, and Miller (1989) suggested that 90 different words or 250 total words are needed for sufficient evaluation of a child's phonological abilities (although we know that most single-word tests of articulation contain far fewer than the suggested amount). In more recent work, item response theory has been applied to single-word tests of articulation to distinguish, more effectively and efficiently, the performance of children with and without SSDs (Brackenbury, Zickar, Munson, & Storkel, 2017). More specifically, reducing single-word samples to only those words that children with SSDs consistently produce in error (whereas children with typical skills consistently produce them correctly) allows SLPs to “zero in” on performance in the disordered range, reducing the number of words needed to obtain an adequate sample. This approach is new, however, and most available tests do not implement this analysis in test construction. In terms of phonological patterns, Kirk and Vigeland (2015) suggested that a test should provide at least four opportunities for the 11 English phonological error pattern types and that there should be ample opportunity for errors across word position and in clusters.

    On the other hand, face validity is a surface measure. Does the test appear to test what it says it does? Are the items appropriate, at face value, to test articulation and phonological skills? Face validity is mentioned here because it should not be confused with content validity, which is a much deeper measure. Face validity indicates if the test looks like it tests what it says it tests; content validity measures whether or not it actually accomplishes that aim. The content validity of a test is essential knowledge for an SLP choosing an appropriate standardized test, so SLPs should take caution in accepting reports of only face validity in the absence of content validity.

    Standardized tests of articulation and phonology that are designed to differentiate typical from disordered speech must clearly report data on these two main psychometric properties. If the data are (a) not reported or accessible in the test manual or (b) do not meet the minimum criterion requirements for these parameters, the test should not be used as an assessment tool. The absence of standardized tests as part of an assessment protocol is difficult for many SLPs to adopt. For one, standardized tests are widely used, efficient, and familiar. What is important to recognize, however, is they might not be representing the true abilities of the children on our caseloads. This problem brings us to the topic of alternative methods of assessment: How do we fill the gap?

    Criterion-Referenced Tests of Articulation and Phonology

    As was covered thus far, evaluation of SSDs in children very often includes the use of standardized tests (Skahan et al., 2007); however, criterion-referenced measures are a less utilized, but highly informative, way to evaluate child speech. What do we mean by norm-referenced versus criterion-referenced testing? Norm-referenced tests compare the child you are testing to a sample of children, given the same test, for comparison. Criterion-referenced testing uses published criteria from studies on child phonological development for comparison. The ideal normative sample in norm-referenced tests, when the aim is to make an accurate diagnosis of disorder, is a sample made up of typical children. In criterion-referenced studies, children with both typical and disordered speech sound abilities are included in the study sample to develop cutoff scores or boundaries between what is considered typical and disordered based on a child's age. Knowledge of cutoff scores allows an SLP to compare an individual child's performance to the cutoff score, or criterion, derived from a study (see Table 1).

    Table 1. Published criterion references for phonological abilities in children.

    Measure Monolingual English Bilingual English Bilingual Spanish
    Percent Occurrence of Phonological Error Patterns (Processes) • Typically developing children, ages 2;10–5;2: Most common phonological patterns include gliding, weak syllable deletion, glottal replacement, cluster reduction, labial assimilation, vocalization, final consonant deletion; stopping, and fronting; use of phonological patterns is reduced by 50% between 3;0 and 4;0 (Haelsig & Madison, 1986) • Children with disorders, ages 3;7–13;0: Number of processes used ranged from 3 to 10; Final consonant deletion, gliding, unstressed syllable deletion, stopping of fricatives, and vowelization were the most common patterns (McReynolds & Elbert, 1981) • Typically developing monolingual English speakers living in a Spanish–English environment: At age 4;0, can demonstrate final consonant devoicing as high as 20%–30% (Fabiano-Smith & Hoffman, 2018) At age 3;0 manner classes with low accuracy include stops and fricatives (Fabiano-Smith & Goldstein, 2010b) • Typically developing children, at age 4;0, exhibit gliding, stopping, final consonant devoicing, and cluster reduction at rates 10% or lower with the exception of gliding (53%). Four-year-olds with disorders exhibit cluster reduction, gliding, final consonant devoicing, and stopping all at rates higher than 10% (Fabiano-Smith & Hoffman, 2018). Consonant devoicing occurred at approximately 30% in both groups, but is linked to dialect rather than disorder. At age 3;0, manner classes with low accuracy include flap and trill (Fabiano-Smith & Goldstein, 2010b) • Typically developing children, ages 3;0–4;0, Puerto Rican dialect: Phonological patterns with a frequency of occurrence greater than 10% of diagnostic significance; at age 3;0, cluster reduction occurs at 15% and is the most common; at age 4;0, no patterns occurred at a rate greater than 10%. • Other commonly occurring patterns included final consonant deletion, liquid simplification, weak syllable deletion, assimilation, fronting, and stopping (Goldstein & Iglesias, 1996)
    Percent Consonants Correct (PCC; Shriberg & Kwiatkowski, 1982) Age 4: > 90% = mild, 65%–85% = mild–moderate, 50%–65% = moderate-severe, and < 50% = severe (Shriberg & Kwiatkowski, 1994) Age 3;0: PCC = 72% in bilingual children (Fabiano-Smith & Goldstein, 2010b) Age 5;0: PCC-Ra = 94% (Goldstein et al., 2005); PCC = 88% (Fabiano-Smith & Hoffman, 2018) Age 3;0: PCC = 66% in typically developing children (Fabiano-Smith & Goldstein, 2010b) Age 5;0: PCC-R = 91% (Goldstein et al., 2005)
    Phonetic Inventory Complexity • Phonetic inventories complete by age 3;0 in typically developing children; in children with SSD, ages 3;0–6;0, the following characteristics were observed across children: Incomplete phonetic inventories, variation in the number and types of sounds, many sounds but no phonological distinctions, or many phonological distinctions and few sounds (Dinnsen et al., 1990) • Phonetic inventories complete by age 3;0 in typically developing bilingual children (Fabiano-Smith & Barlow, 2010) • Phonetic inventories complete, or nearly complete, by age 3;0 in typically developing bilingual children (Fabiano-Smith & Barlow, 2010)
    Percent Intelligibility Range of percent intelligibility for preschoolers: 68%–100% Mean percent intelligibility: 85% Cutoff: For a child 4;0 or older, intelligibility in connected speech that falls below 66% may be a potential indicator of speech sound disorder (Gordon-Brannan & Hodson, 2000) Unavailable Unavailable
    Scaffolding Scale of Stimulability (SSS) (Glaspey & Stoel-Gammon, 2005) • Using stimulability testing as dynamic assessment, higher scores on the SSS indicate higher likelihood of SSD Unavailable Unavailable

    aPCC-R refers to Percent Consonants Correct–Revised (PCC-R), which is a different calculation from the original calculation of PCC. In PCC-R, distortions are counted as correct, but omissions and substitutions are counted as errors.

    Like in the evaluation of standardized tests, the validity of these cutoff scores is not to be overlooked; there should be published evidence that one or more studies demonstrated correlated test scores on a given criterion-referenced measure and that it is supported by a strong underlying theoretical or developmental construct (McCauley & Strand, 2008). For example, Shriberg and Kwiatkowski (1982) provide extensive detail on how severity cutoff scores were derived for the consonant accuracy measure, Percent Consonants Correct (PCC). Correlations were derived among the listening judgments of unskilled listeners and SLPs. Those judgments were then correlated with children's performance on PCC. These analyses resulted in cutoff scores that indicated mild, moderate, and severe severity categories for children with SSDs. Supporting published data such as these instills confidence that a criterion-referenced measure is truly measuring what it is intended to measure.

    McCauley (1996) reported the three most commonly used criterion-referenced measures in the evaluation of SSD: percent occurrence of phonological patterns (Preston, Hull, & Edwards, 2013), PCC (Percent Consonants Correct–Revised [PCC-R]; Shriberg & Kwiatkowski, 1994), and percent intelligibility (Gordon, Brannan, & Hodson, 2000). Published criteria for these measures can be found in Table 1. To illustrate how criterion-referenced measures are used, take, for example, PCC-R (Shriberg & Kwiatkowski, 1994). PCC-R is a measure of overall consonant accuracy in children. Shriberg and Kwiatkowski (1994) measured PCC-R in hundreds of preschoolers and provided cutoff scores that separate out typically developing children from children with SSDs. One of their findings was that 5-year-old monolingual English-speaking children should exhibit greater than 90% accuracy on consonant production. Presence of disorder and severity of disorder can be derived from how far below that criterion a child is performing. A clinician can ask himself if the 5-year-old he evaluated reached the 90% criterion or if he or she is exhibiting only 70% or 50% accuracy. More recently, Fabiano-Smith and Hoffman (2018) performed a similar study examining PCC-R in the English productions of bilingual Spanish- and English-speaking 5-year-olds. From a group of 44 children, with and without SSDs, it was observed that cutoff scores that differentiated typical from disordered speech production did not differ significantly between monolinguals and bilinguals. This is clinically important because, for bilingual children, standardized, norm-referenced tests are few. Therefore, using a criterion-referenced approach, rather than a norm-referenced approach, is essential in making an accurate diagnosis. It is important to point out, however, that deriving criterion-referenced data from standardized tests of articulation and phonology can lead to clinician error. Eisenberg and Hitchcock (2010) examined standardized tests that are often used to derive phonetic inventories for child speech. They found that standardized tests did not provide enough opportunities for children to produce all the sounds of their language, with enough frequency, to develop a representative phonetic inventory. They suggested the use of a single-word test plus additional words that supplement the missing sounds. Fabiano-Smith and Crouse-Matlock (2012) found that connected speech samples, specifically play samples in preschoolers, provided ample opportunities for phonetic inventory development in both monolingual English-speaking and bilingual Spanish- and English-speaking children.

    It is essential that speech sound evaluations take into consideration a number of measures, at different levels of the sound system, to arrive at an accurate diagnosis (number of substitution errors, phonetic inventory complexity, overall consonant accuracy, etc.); however, because survey data indicated that over half of SLPs, both in the United States and in other English-speaking countries, are choosing standardized tests as part of their evaluation protocol, the utility of criterion-referenced measures is emphasized and encouraged. The relative strengths or weaknesses of a standardized test will depend on how it was developed, and each test should be thoroughly assessed for its psychometric merit before any are selected for use in assessment. Unfortunately, because the majority of standardized tests of articulation and phonology currently did not meet basic psychometric requirements (Kirk & Vigeland, 2014), it is essential that they do not act as the cornerstone of speech sound assessment.

    Evidence-Based Evaluation of SSDs

    What is included in an evidence-based protocol for the assessment of SSDs? What are some alternative approaches to the use of standardized tests for the assessment of SSD? Here, we provide a step-by-step approach to assessment of speech production for either monolingual or bilingual children (adapted from work by Brian A. Goldstein and Leah Fabiano-Smith). The criterion references that can be used to interpret child data collected in this manner can be found on Table 1.

    Step 1: Perform a Detailed Case History

    According to a systematic review of the literature on pediatric screening for speech and language disorders in children 5 years old or younger (Wallace et al., 2015), male sex, family history of speech and language disorders, and low SES were the most significant predictors of disorder. Importantly, African American children and children whose primary home language is not English are less likely to receive speech-language therapy services than their White peers, even when minority group status and SES are correlated (Morgan et al., 2016). What these findings tell us is that we need to (a) ask if there is a family history of speech and language impairment in the child's biological family and (b) that we need to use culturally sensitive assessment practices to be sure we are identifying all children who present with speech-language impairment. In addition to standard background history that we gather from caretakers, we also want to ask what languages are spoken in the home, by whom, and how often (language input). We also want to ask what language(s) the child uses in the home and at school, how often, and with whom (language output). This information gives the SLP an idea of what level of language proficiency, in each language, a child exhibits. If a child uses English only 10% of the time, we cannot expect him to perform at a typical level on a standardized test in English, even if he has typical phonological skills. Ask parents how many hours a day the child is awake, ask what a typical day is like for that child (home and school, if applicable), and ask about weekends separately (weekends can often present a very different linguistic environment for children). Ask these questions each time you assess a child, as linguistic environments can change over time, impacting language proficiency levels.

    Step 2: Routine Assessments

    As with any evaluation, we want to include an oral peripheral examination and a hearing evaluation. Oral peripheral examinations test cranial nerve function and screen for motor speech impairments. Any child who is referred for a speech-language evaluation should receive a full audiologic evaluation, not simply a hearing screening.

    Step 3: Obtain Speech Samples

    A single-word sample should be collected in all languages of the child. We can use a standardized test in a nonstandardized way, but we need to acknowledge that it might not provide the number or type of opportunities needed to truly represent a child's skill set. We do not score the test—we simply use the word list. We supplement this list with a connected speech sample, in all languages, to (a) judge intelligibility subjectively or calculate a quantitative intelligibility measure and (b) supplement the word list for the development of a phonetic inventory. Phonetic transcription is required of the single-word speech sample only.

    Step 4: Phonetic Inventory (Independent Analysis)

    Determine the phonetic inventory of the child, in all languages, using the child's speech samples. At least two productions of each sound, whether produced in the correct context or not, is needed for a phone to be included in the inventory (Dinnsen et al., 1990). Organize the inventory by place and manner of articulation for ease of evaluation. Be sure to supplement a single-word list with a connected speech sample to be sure the child has enough opportunities to produce each sound at least two times.

    Step 5: Consonant Accuracy (Relational Analysis)

    Again, here, we can use the single-word test in a nonstandardized way: Count the number of opportunities for consonant production in the sample (you will only need to do this once if you use the same test repeatedly). This can also be done for vowels in younger children and children who exhibit vowel errors. Then, count the total number of consonants the child produced correctly. Derive a percent accuracy with total number of opportunities for consonant production as a denominator and total consonants produced correctly as the numerator. Calculating overall consonant accuracy from a single-word sample is not the same approach used in the development of PCC (Shriberg, Austin, Lewis, McSweeny, & Wilson, 1997), but for clinical ease and efficiency, this method has been used in research-based studies (e.g., Fabiano-Smith & Goldstein, 2010a, 2010b). Using the connected speech sample to supplement this analysis is also a useful approach.

    Step 6: Error Analysis (Substitutions, Omissions, Distortions)

    Using your phonetic transcriptions, examine the targets that a child is avoiding and the sounds she is selecting to use as substitutes for those targets. For children who speak more than one language, examine your samples for instances of cross-linguistic effects or sounds specific to one language being used as substitutes in the other language (e.g., the Spanish trill /r/ being used as a substitute in English). These productions should not be counted as errors.

    Step 7: Phonological Error Pattern Analysis

    The type and frequency of phonological error patterns should be calculated from your single-word sample, separately for each language. Shriberg and Kwiatkowski (1994) found that the average frequency of occurrence of phonological error patterns in monolingual English-speaking children with SSDs differs significantly from children with typically developing skills. Diagnostic criteria for phonological error patterns in monolingual English speakers are well established (McReynolds & Elbert, 1981). Type and frequency parameters for bilingual Spanish- to English-speaking children are still in development (Fabiano-Smith, Privette, & An, in preparation), but preliminary parameters derived from smaller data sets are readily available (e.g., Fabiano-Smith & Goldstein, 2010b; Fabiano-Smith & Hoffman, 2018; Goldstein, Fabiano, & Washington, 2005; Goldstein & Washington, 2001).

    Step 8: Measures of Whole-Word Proximity (Phonological Mean Length of Utterance and Proportion of Whole-Word Proximity)

    Measures of whole-word proximity are more nuanced analyses of word-level errors (Ingram, 2002). By examining the relationship between a child's production and the adult target form, a word that would otherwise be scored as incorrect based on a single error receives credit to for how closely its error(s) approximate the adult target form. Phonological mean length of utterance (Pmlu) indicates how complex a child's production is in comparison to an adult target, and proportion of whole-word proximity indicates how closely a child's productions approximate the adult target. These measures can be used to identify a child's stage of acquisition, to assess proximity to target words, and to evaluate the complexity of words beyond the level of the segment (Ingram, 2002). To calculate Pmlu, 1 point is given for each correct vowel and consonant and an extra point is given for every correct consonant. Phonological whole-word proximity is calculated by dividing the Pmlu of the child by the Pmlu of the target word. Babatsouli, Ingram, and Sotiropoulous (2014) found that the combination of these two measures might help to discriminate typical from disordered speech skills in the English and the Spanish of bilingual children. More specifically, examining only one of the two measures did not discriminate between typical children and children with disorders, but when these two measures were placed in a regression model for comparison, evidence of disorder was observed (see also Bunta, Fabiano-Smith, Goldstein, & Ingram, 2009). These analyses must be performed by hand, but the time invested might provide more in-depth insight into a child's phonological abilities.

    Step 9: Stimulability

    Stimulability testing can serve as a diagnostic indicator at the phonemic, manner class, and syllabic levels. Using the Scaffolding Scale of Stimulability (SSS; Glaspey & Stoel-Gammon, 2005), a dynamic assessment can be used to indicate whether or not a child presents with an SSD. Instead of the traditional “simulable/not stimulable” dichotomy, the SSS provides hierarchical scaffolding support for children as they demonstrate their knowledge, or partial knowledge, of the target sound or structure. This 21-point scale assesses phonemes, clusters, syllables, and word shapes across seven different phonetic environments, providing scaffolding options at each step. Children who accumulate more points on the SSS are more likely to need support in production (i.e., more likely to present with a disorder) versus those children who earn less points, indicating less support required for accurate production (i.e., less likely to present with a disorder).

    Step 10: Intelligibility

    Depending on the severity of a child's SSD, deriving a subjective estimate of intelligibility might not be sufficient for characterizing the nature of their disorder. There are multiple approaches to quantifying intelligibility of speech, and depending on the needs of a particular child during assessment, implementation of more than one approach might be useful. For an overview of multiple approaches, see Kent, Miolo, and Bloedel (1994).

    Incorporating criterion-referenced measures into an evaluation protocol will provide the SLP with depth, as well as breath, of the phonological skills a child possesses. Whereas a standardized test provides one, static score (that might not be representative), criterion-referenced approaches provide detailed information, in some cases over time (e.g., the SSS), that allow the SLP to observe a child's true skill set and inform evidence-based treatment.

    Case Studies

    We can apply this assessment protocol to the three case studies woven throughout this issue. To start, both monolingual and bilingual children by the age of 3;0 (years;months), both with typical development and with SSDs, have nearly complete phonetic inventories (Dinnsen et al., 1990; Fabiano-Smith & Barlow, 2010). If, at this age or older, a child presents with a sparse phonetic inventory, then it would be a cause for clinical concern. In the three case studies described, all children have nearly complete phonetic inventories. The first child is lacking late-acquired sound phonemically but not necessarily phonetically. He might be able to produce the late sounds as substitutes for other sounds but cannot do so accurately in the target context. The child in Case Study 2 may or may not have the approximant /ɹ/ in his inventory, but regardless, it is the only sound lacking. This places him at nearly the most complex level (see Fabiano-Smith & Barlow, 2010, for typological levels of complexity in both English and Spanish). The third child exhibits the error of lateralizing fricatives, so he might be lacking a stridency distinction between fricatives, but the rest of his skill set places him at the next to highest level of phonetic inventory complexity.

    The measure on which we might see more of a distinction among children is on the relational measure, PCC. Low accuracy on late-acquired sounds (as a category) can pull down overall consonant accuracy (e.g., Fabiano-Smith & Goldstein, 2010a, 2010b); therefore, PCC will likely reveal the impairment for the child in Case Study 1 and not for the other two children who have a few sound-specific errors.

    A substitution error analysis will yield qualitative information for the first child, as we can observe the substitutes he is using in place of the late-acquired targets he is avoiding. For Child 2, this analysis will not yield much more information, as only one sound is in error. For Child 3, however, a substitution error analysis will aid in determining if a specific set of fricatives are subject to lateralization or if all fricatives are at risk.

    When we move to phonological error pattern analysis, impairment should be revealed for all three children, as the first child will likely exhibit a high level of gliding and stopping of fricatives, the child in Case Study 2 will also exhibit a high level of gliding, and the child in Case Study 3 will exhibit a high level of lateralization errors on fricatives. Measures of whole-word proximity will likely be more useful for Child 1 than for Children 2 and 3, as there are more sounds in error; therefore, obtaining information on how close to the target sound a child is may tell us if he or she is moving toward more accurate production (i.e., errors sharing some features with their targets). For Children 2 and 3, the targets that trigger their errors are already clear.

    Stimulability would be a helpful measure for all three children in terms of making predictions about time frame for future mastery of the sound(s) in question. Finally, intelligibility ratings will likely be unnecessary for Child 2 because only one predictable sound is in error. For Children 1 and 3, however, a measure of intelligibility will aid in addressing how their particular errors are affecting their overall communication abilities.


    Including a number of different measures, at different levels of the sound system, is necessary for accurate diagnosis of SSD. Psychometrically sound standardized tests can act as a piece of an assessment battery but should never act as the sole indicator of typical versus disordered development. Furthermore, a standardized test that is not psychometrically sound should never be scored and submitted as a diagnostic indicator. Criterion-referenced procedures and the utilization of standardized tests in nonstandardized ways will provide information that is reflective of a child's speech skills, supporting the SLP in accurate diagnosis of SSD.


    This work was funded by National Institute on Child Health and Human Development Grant 1-R21-HD081382-01A1 (2015–2018) and National Institute on Deafness and Other Communication Disorders Grant 1-R01-DC016624-01A1 (2018–2023) received by the author. Open access fees were paid for by an award to Holly Storkel from the Friends of the Life Span Institute at the University of Kansas.


    • American Educational Research Association. (2014). Standards for educational and psychological testing (2014 ed.). Washington, DC: Author.
    • Babatsouli, E., Ingram, D., & Sotiropoulos, D. (2014). Phonological word proximity in child speech development.Chaotic Modeling and Simulation, 4(3), 295–313.
    • Bunta, F., Fabiano-Smith, L., Goldstein, B., & Ingram, D. (2009). Phonological whole-word measures in 3-year-old bilingual children and their age-matched monolingual peers.Clinical Linguistics & Phonetics, 23(2), 156–175.
    • Brackenbury, T., Zickar, M. J., Munson, B., & Storkel, H. L. (2017). Applying item response theory to the development of a screening adaptation of the Goldman-Fristoe Test of Articulation–Second Edition.Journal of Speech, Language, and Hearing Research, 60(9), 2672–2679.
    • Dinnsen, D. A., Chin, S. B., Elbert, M., & Powell, T. W. (1990). Some constraints on functionally disordered phonologies: Phonetic inventories and phonotactics.Journal of Speech and Hearing Research, 33(1), 28–37.
    • Eisenberg, S. L., & Hitchcock, E. R. (2010). Using standardized tests to inventory consonant and vowel production: A comparison of 11 tests of articulation and phonology.Language, Speech, and Hearing Services in Schools, 41(4), 488–503.
    • Fabiano-Smith, L., & Barlow, J. A. (2010). Interaction in bilingual phonological acquisition: Evidence from phonetic inventories.International Journal of Bilingual Education and Bilingualism, 13(1), 81–97.
    • Fabiano-Smith, L., & Crouse-Matlock, S. (2012). Phonetic inventory elicitation in bilingual Spanish–English speaking children: What toys are best? Poster presented at the annual convention of the American Speech-Language-Hearing Association, Atlanta, GA.
    • Fabiano-Smith, L., & Goldstein, B. A. (2010a). Early-, middle-, and late-developing sounds in monolingual and bilingual children: An exploratory investigation.American Journal of Speech-Language Pathology, 19(1), 66–77.
    • Fabiano-Smith, L., & Goldstein, B. A. (2010b). Phonological acquisition in bilingual Spanish–English speaking children.Journal of Speech, Language, and Hearing Research, 53(1), 160–178.
    • Fabiano-Smith, L., & Hoffman, K. (2018). Diagnostic accuracy of traditional measures of phonological ability for bilingual preschoolers and kindergarteners.Language, Speech, and Hearing Services in Schools, 49(1), 121–134.
    • Fabiano-Smith, L., Privette, C., & An, L. (in preparation). Phonological abilities in bilingual Spanish–English speaking children: The language combination effect. Manuscript in preparation.
    • Flipsen, P., Jr., & Ogiela, D. A. (2015). Psychometric characteristics of single-word tests of children's speech sound production.Language, Speech, and Hearing Services in Schools, 46(2), 166–178.
    • Glaspey, A. M., & Stoel-Gammon, C. (2005). Dynamic assessment in phonological disorders: The Scaffolding Scale of Stimulability (SSS).Topics in Language Disorders, 25(3). 220–230.
    • Goldman, R., & Fristoe, M. (2000). Goldman-Fristoe Test of Articulation–Second Edition (GFTA-2). San Antonio, TX:Pearson Clinical.
    • Goldstein, B. A., Fabiano, L., & Washington, P. S. (2005). Phonological skills in predominantly English-speaking, predominantly Spanish-speaking, and Spanish–English bilingual children.Language, Speech, and Hearing Services in Schools, 36(3), 201–218.
    • Goldstein, B. A., & Iglesias, A. (1996). Phonological patterns in normally developing Spanish speaking 3-and 4-year-olds of Puerto Rican descent.Language, Speech, and Hearing Services in Schools, 27(1), 82–90.
    • Goldstein, B. A., & Washington, P. S. (2001). An initial investigation of phonological patterns in typically developing 4-year-old Spanish–English bilingual children.Language, Speech, and Hearing Services in Schools, 32(3), 153–164.
    • Gordon-Brannan, M., & Hodson, B. W. (2000). Intelligibility/severity measurements of prekindergarten children's speech.American Journal of Speech-Language Pathology, 9(2), 141–150.
    • Haelsig, P. C., & Madison, C. L. (1986). A study of phonological processes exhibited by 3-, 4-, and 5-year-old children.Language, Speech, and Hearing Services in Schools, 17(2), 107–114.
    • Hutchinson, T. A. (1996). What to look for in the technical manual: Twenty questions for users.Language, Speech, and Hearing Services in Schools, 27(2), 109–121.
    • Ingram, D. (2002). The measurement of whole-word productions.Journal of Child Language, 29(4), 713–733.
    • Kent, R. D., Miolo, G., & Bloedel, S. (1994). The intelligibility of children's speech: A review of evaluation procedures.American Journal of Speech-Language Pathology, 3(2), 81–95.
    • Khan, L., & Lewis, N. (2002). Khan-Lewis Phonological Analysis–Second Edition. San Antonio, TX: Pearson Clinical.
    • Kirk, C., & Vigeland, L. (2014). A psychometric review of norm-referenced tests used to assess phonological error patterns.Language, Speech, and Hearing Services in Schools, 45(4), 365–377.
    • Kirk, C., & Vigeland, L. (2015). Content coverage of single-word tests used to assess common phonological error patterns.Language, Speech, and Hearing Services in Schools, 46(1), 14–29.
    • McCauley, R. J. (1996). Familiar strangers: Criterion-referenced measures in communication disorders.Language, Speech, and Hearing Services in Schools, 27(2), 122–131.
    • McCauley, R. J., & Strand, E. A. (2008). A review of standardized tests of nonverbal oral and speech motor performance in children.American Journal of Speech-Language Pathology, 17(1), 81–91.
    • McCauley, R. J., & Swisher, L. (1984). Psychometric review of language and articulation tests for preschool children.Journal of Speech and Hearing Disorders, 49(1), 34–42.
    • McReynolds, L. V., & Elbert, M. (1981). Criteria for phonological process analysis.Journal of Speech and Hearing Disorders, 46(2), 197–204.
    • Morgan, P. L., Hammer, C. S., Farkas, G., Hillemeier, M. M., Maczuga, S., Cook, M., & Morano, S. (2016). Who receives speech/language services by 5 years of age in the United States.American Journal of Speech-Language Pathology, 25(2), 183–199.
    • Pavleko, S. L. (2018). Evidence-based practices for selecting and using standardized tests: Considerations for speech testing. Retrieved from
    • Peña, E. D., Spaulding, T. J., & Plante, E. (2006). The composition of normative groups and diagnostic decision making: Shooting ourselves in the foot.American Journal of Speech-Language Pathology, 15(3), 247–254.
    • Plante, E., & Vance, R. (1994). Selection of preschool language tests: A data-based approach.Language, Speech, and Hearing Services in Schools, 25(1), 15–24.
    • Preston, J. L., Hull, M., & Edwards, M. L. (2013). Preschool speech error patterns predict articulation and phonological awareness outcomes in children with histories of speech sound disorders.American Journal of Speech-Language Pathology, 22(2), 173–184.
    • Shriberg, L. D., Austin, D., Lewis, B. A., McSweeny, J. L., & Wilson, D. L. (1997). The percentage of consonants correct (PCC) metric: Extensions and reliability data.Journal of Speech, Language, and Hearing Research, 40(4), 708–722.
    • Shriberg, L. D., & Kwiatkowski, J. (1982). Phonological disorders III: A procedure for assessing severity of involvement.Journal of Speech and Hearing Disorders, 47(3), 256–270.
    • Shriberg, L. D., & Kwiatkowski, J. (1994). Developmental phonological disorders I: A clinical profile.Journal of Speech and Hearing Research, 37(5), 1100–1126.
    • Skahan, S. M., Watson, M., & Lof, G. L. (2007). Speech-language pathologists' assessment practices for children with suspected speech sound disorders: Results of a national survey.American Journal of Speech-Language Pathology, 16(3), 246–259.
    • Wallace, I. F., Berkman, N. D., Watson, L. R., Coyne-Beasley, T., Wood, C. T., Cullen, K., & Lohr, K. N. (2015). Screening for speech and language delay in children 5 years old and younger: A systematic review.Pediatrics, 136(2), e448–e462.
    • Weston, A. D., Shriberg, L. D., & Miller, J. F. (1989). Analysis of language-speech samples with SALT and PEPPER.Journal of Speech and Hearing Research, 32(4), 755–766.
    • Williams, C. J., & McLeod, S. (2012). Speech-language pathologists' assessment and intervention practices with multilingual children.International Journal of Speech-Language Pathology, 14(3), 292–305.

    Author Notes


    Financial: Leah Fabiano-Smith has no relevant financial interests to disclose.

    Nonfinancial: Leah Fabiano-Smith has no relevant nonfinancial interests to disclose.

    Correspondence to Leah Fabiano-Smith:

    Editor: Brenda Beverly

    Publisher Note: This article is part of the Forum: Speech Sound Disorders in Schools.

    Additional Resources