Open AccessAmerican Journal of Speech-Language PathologyResearch Article3 Jan 2024

Vocabulary Instruction Embedded in Narrative Intervention: A Repeated Acquisition Design Study With First Graders at Risk of Language-Based Reading Difficulty



    The purpose of the current study was to investigate the effect of vocabulary instruction embedded in the narrative intervention on the immediate and retained definitional knowledge of taught words for first graders at risk for language-related reading difficulties.


    We employed a repeated acquisition design with innovative quality features and supplemental statistics with 11 treatment students and three control students. In the context of the school's multitiered system of supports, treatment students received 30-min small group interventions, 4 days a week for 12 weeks. Intervention involved story retelling and personal story generation lessons, both of which emphasized the learning and practicing of target vocabulary words in each story. Pre- and postprobes of the taught definitions were conducted every week.


    According to visual analysis conventions of single-case research, there was a consistent pattern of improvement from pre- to postprobes for all treatment participants, but for none of the control participants. Retention was also consistently observed, when measured at Week 13. Supplemental statistics confirmed that large effects were associated with the intervention.


    Vocabulary instruction embedded in narrative intervention led to meaningful acquisition and retention of taught vocabulary for students at risk of language-based reading difficulty.

    Speech-language pathologists (SLPs) who work in schools encounter demanding work conditions (Farquharson et al., 2022). SLPs are called upon regularly to help address the needs of students at risk for reading difficulty due to limited language abilities, in addition to the students with disabilities. Prevention activities have long been a part of SLPs' roles and responsibilities alongside intervention and assessment (American Speech-Language-Hearing Association, 2010). However, current educational systems do not always support SLPs working beyond the scope of their own specific caseload of students with disabilities (Farquharson et al., 2022), making it difficult for them to support students who do not have a disability but who are at risk for language-related reading difficulties. This “do more with less” mandate on SLPs can lead to frustration, reduced job satisfaction, and burnout (Brito-Marcelino et al., 2020; Edgar & Rosa-Lugo, 2007; Ewen et al., 2021; Khan et al., 2022).

    The gravity of language-related reading problems can be seen in data from the National Assessment of Educational Progress. Approximately 66% of all fourth-grade students do not comprehend what they read at a proficient level (U.S. Department of Education et al., 2022, Reading Assessment). Yet only 12% of fourth-grade students' word reading accuracy was below 92% (U.S. Department of Education et al., 2018, Oral Reading Fluency study). SLPs who are aware that reading comprehension is the product of word identification and language comprehension (Hoover & Gough, 1990) will recognize that the gap between the relatively low prevalence of word recognition problems (< 20%) and the very high prevalence of reading comprehension difficulty (> 65%) is largely caused by limited oral academic language repertoires (Catts et al., 2006; Cervetti et al., 2020). Research confirms that the English reading comprehension performance of economically disadvantaged students and those learning English is primarily the result of inadequate attainment of the school language, and not the result of word reading problems (Kieffer & Vukovic, 2012; Nakamoto et al., 2007).

    One way to address the vast and diverse oral language needs of elementary students is to supply teachers with proper curriculum to promote students' oral language in general education classrooms or through targeted interventions (Cain et al., 2004; Eadie et al., 2022; Grosche & Volpe, 2013; Lo & Xu, 2022; Perfetti, 2007). Academic language is a specialized form of language needed to acquire and express knowledge and comprises word-, sentence-, and discourse-level patterns (Cervetti et al., 2020; Phillips Galloway et al., 2020). In schools, multitiered system of supports (MTSS) models could be used as a framework for promoting oral academic language (Petersen et al., 2022). In such a model, oral language skills can be routinely monitored and addressed in supplemental (e.g., Tier 2) and intensive interventions (e.g., Tier 3; Wackerle-Hollman et al., 2021). Whether as leaders, consultants, or interventionists, it is highly likely that SLPs will play a critical role in furthering the movement toward tiered oral language interventions (McKenna et al., 2021; Sylvan, 2020). Certainly, SLPs and educators must rely on evidence-based oral language programs that can simultaneously accommodate the diverse language needs of students with disabilities and students at risk for language-based reading difficulties.

    Narrative Intervention Research

    To effectively address the diverse needs in schools, there is a need for SLPs and teachers to combine their efforts to maximize their impact on students' academic outcomes. Narrative intervention is an approach that both sets of professionals use. Because multiple dimensions of academic language (e.g., discourse structures, sentence structures, and words; Phillips Galloway et al., 2020) can be targeted in an integrated manner, it is an efficient intervention approach that benefits a range of students, including students with disabilities (Hessling & Schuele, 2020), economically disadvantaged students (Adlof et al., 2014), students learning English (Spencer et al., 2020), and typically developing students (Petersen et al., 2020). Narrative intervention is a contextualized language intervention approach that employs oral storytelling (retelling or story generation) as the primary teaching procedure with the intentional promotion of specific language features relevant to academic performance (Petersen, 2011; Spencer & Petersen, 2020; Zamani et al., 2016). Interventions can use children's storybooks or specially constructed stories. Often, visual materials such as illustrations, icons, and graphic organizers are used to support teaching the targeted features. Within the context of storytelling, interventionists can target the story grammar, which makes up the discourse structure of narratives, or the language used to tell the story (i.e., words and sentences). Narrative intervention has been touted as a flexible and fun way of motivating children to practice the targeted language skills in a large group, a small group, or an individual arrangement (see Lenhart et al., 2020).

    Narrative intervention applied in the treatment of children with language disabilities has been the focus of multiple reviews, which reveal its use among SLPs has increased in recent years (Favot et al., 2021; Petersen, 2011; Zamani et al., 2016). Despite the recent explosion of narrative interventions applied to children with language disabilities, narrative-related interventions to enhance comprehension have been reported in the general education literature for quite some time (see Dimino et al., 1995). To capture the overall impact of narrative-based interventions for children with and without disabilities, Pico et al. (2021) conducted a meta-analysis of 40 intervention studies that targeted narrative production or comprehension as outcomes. Sixteen of their studies included participants who were at risk due to income-related risk factors or who were learning English. With moderate to large effects and more rigorous research methods employed in this set of studies, the researchers concluded that narrative interventions were beneficial for a wide range of preschool and school-aged children and could be delivered by SLPs and other educators in various arrangements (e.g., large and small groups).

    In the corpus of published narrative intervention studies, the most common outcomes addressed were related to the production and comprehension of discourse-level (i.e., story grammar) and sentence-level complexity (Pico et al., 2021). A smaller number of studies included a focus on teaching vocabulary in addition to teaching narrative structures and narrative language (i.e., sentence structures used to tell stories). However, teaching word meanings and word learning are a critical part of academic language promotion, especially if enhanced reading comprehension is the goal (Beck et al., 2013; Wright & Cervetti, 2017). For SLPs, intervention approaches that integrate several dimensions of academic language may offer the greatest efficiency and broadest impact.

    Vocabulary Instruction Embedded in Narrative Intervention

    Vocabulary knowledge is one of the most critical academic language skills with direct links to reading comprehension (Dickinson et al., 2010). Both general academic words and domain-specific words should be taught in schools for the greatest impact on academic outcomes (Hirsch, 2003; Scarborough, 2001). General academic words align with Beck et al.'s (2013) concept of Tier 2 words, which are applicable to many topics in school. Although they are uncommon in students' language, general academic words are extremely useful in a variety of academic contexts. On the other hand, domain-specific words are considered Tier 3 words, which are only useful in a few, narrow academic contexts such as science or social studies lessons. There are well-established recommendations for teaching both types of new words (e.g., Baumann & Kame'enui, 2004). Vocabulary researchers recommend teaching specific word meanings and word-solving strategies (Wright & Cervetti, 2017) through repeated exposure and practice across multiple contexts (Beck et al., 2013). As a broad, multifaceted construct, the measurement of vocabulary knowledge can be captured in the form (i.e., phonological or grammatical aspects), meaning (i.e., semantic aspects), or use (i.e., putting words in action) of words. In addition, an individual's breadth (i.e., number of known words) and depth (i.e., how well words are known) of word knowledge can be measured (Hadley & Dickinson, 2020).

    The vastness of the vocabulary instruction research has been reviewed often (e.g., Baker et al., 2014; Elleman et al., 2009; Wright & Cervetti, 2017). When it is divorced from the promotion of other academic language, vocabulary instruction is unlikely to impact reading comprehension (Cervetti et al., 2020; Phillips Galloway et al., 2020; Wright & Cervetti, 2017). Hence, it behooves the field to develop and investigate the effect of vocabulary interventions paired with interventions to promote other oral academic language skills. Contextualizing language interventions (Ukrainetz, 2006)—in this case with a focus on vocabulary—in meaningful academic activities such as oral storytelling is a recommended practice for SLPs, and it is necessary for teachers. The integration of multiple dimensions of oral academic language targets can also make the intervention more efficient by allowing teachers and SLPs to address multiple targets at once. Although there has never been a review of research specifically investigating vocabulary instruction embedded in a narrative intervention, a handful of researchers have examined this combination of oral academic language interventions.

    For example, Gillam et al. (2014) embedded vocabulary instruction in a whole-class narrative intervention in a quasi-experimental study. Participants included 43 students (ages 6;6–7;4 [years;months]) in two elementary school classrooms; only one class received the treatment. During the intervention, an SLP used experimenter-created wordless picture books designed to highlight story grammar, sentence complexity, and vocabulary. While discussing the wordless storybooks, the SLP introduced and defined the target words, which were related to story grammar (e.g., setting, initiative event), book concepts (e.g., author, illustrator), and general academic vocabulary or Tier 2 words (e.g., frantically, discover, sneaky). Before and after 6 weeks of 30-min whole-class instructional sessions delivered 3 times each week, a vocabulary probe was conducted. Results indicated that vocabulary instruction embedded in narrative intervention had a large effect (d = 1.02) on students' definitions, with a differential impact on students identified as high risk (d = 0.66) or low risk (d = 2.28).

    In a similar early-stage, feasibility study, Adlof et al. (2014) examined the effects of a small group narrative intervention with embedded vocabulary instruction on story grammar, sentence complexity, and receptive picture vocabulary with nine African American children ranging from 3 to 6 years of age. Participants were randomly assigned to a narrative treatment group or comparison group that focused on decoding. To deliver the narrative intervention, the research clinicians read commercially available children's books and, while reading, drew attention to the story grammar elements and encouraged predictions based on pictures. As general academic words came up in the storybook, clinicians explained them, and students acted out definitions. On the Peabody Picture Vocabulary Test–Fourth Edition (Dunn & Dunn, 2007), students who received the intervention made a median gain of 7.5 (4–14) raw score points compared to the comparison students' median gain of 3.0 (−6 to 19).

    In another quasi-experimental study, Armon-Lotem et al. (2021) examined the effects of an English and Hebrew small group narrative intervention on vocabulary outcomes for 16 typically developing, bilingual preschool children. The narrative intervention, delivered by teachers, featured the use of specially crafted stories that emphasized complete narrative episodes (i.e., problem, attempt, consequence/ending), emphasized complex sentences (e.g., subordinate clauses, adverbs), and included two general academic words. Illustrations and icons were used to support the retelling of the model story, and teachers modeled and prompted children to use the target vocabulary words when it was their turn to retell the story. Children defined 12 target words in each language before and after each of the two intervention blocks, one in English and one in Hebrew. The English intervention block resulted in large effects on definitional vocabulary in English (d = 1.610) and Hebrew (d = 1.008). Hebrew definitional vocabulary continued to grow (d = 1.065) when the Hebrew intervention block was in place, and children maintained their English vocabulary.

    Using similarly constructed stories and the same intervention procedures and materials as those in the Armon-Lotem study, Spencer et al. (2019, 2020) examined the effects of a dual language, Spanish–English, small group narrative intervention on a variety of narrative-related outcomes, including receptive picture vocabulary. In a multiple baseline single-case design study (Spencer et al., 2019), they found large effects for English receptive vocabulary (d = 0.98) but only small effects for Spanish (d = 0.34). In a follow-up randomized group study examining the same intervention (Spencer et al., 2020), Head Start teachers delivered three 12-week cycles of the same dual language intervention in large group and small group arrangements (N = 81 bilingual preschoolers in 23 classrooms). Compared to the control group, children who received the dual language narrative intervention showed significant improvement in receptive understanding of the target vocabulary words in English (g = 0.46–0.63) and Spanish (g = 0.31–0.63).

    Summary of Research and Gaps

    There are only five published studies reporting on vocabulary outcomes of narrative interventions. In each study, interventionists used recommended vocabulary instructional strategies (e.g., defining and discussing words, practicing words in context; Beck et al., 2013) alongside oral storytelling activities. Only one study was completed with school-aged children (i.e., Gillam et al., 2014), three included dual language preschoolers as participants, but none of the studies reported the inclusion of participants with identified disabilities. Two of the interventions were delivered to whole classes of students (Gillam et al., 2014; Spencer et al., 2020); the rest were delivered to small groups of students. Except for the Spanish vocabulary outcome in the work of Spencer et al. (2019), the vocabulary instruction embedded in the narrative intervention improved participants' receptive picture vocabulary or definitional vocabulary performance on assessments of targeted or taught words. Altogether, this evidence is promising, as it suggests that embedding vocabulary learning and practice in storytelling is potentially effective. That said, none of the studies investigated the retention of vocabulary knowledge past the immediate posttest assessment. Interventions that lead to greater retentions of words are more socially valid than those without retention, so this seems like an important gap to address.

    Another critical gap is the need to increase the methodological rigor of the research designs used to investigate the effects of a vocabulary instruction embedded in a narrative intervention. All but one of these studies (Spencer et al., 2020) utilized research designs with inherent limitations, such as being underpowered and small scale randomized (N = 9, Adlof et al., 2014), quasi-experimental (N = 16, Armon-Lotem et al., 2021), conducted in two classrooms (Gillam et al., 2014), or a single-case design that was unable to minimize threats to internal validity for the vocabulary outcome (Spencer et al., 2019). When group studies are not possible, single-case designs are often a feasible experimental option. However, popular single-case experimental designs (e.g., multiple baseline) are not appropriate for an examination of vocabulary because words and definitions are discrete and learned rapidly. This issue was apparent in the multiple baseline design study by Spencer et al. (2019). Confidence in the narrative outcomes was strong, because participants' narrative retell progress in the treatment condition could be compared to their baseline performance, with at least three staggered baseline-to-intervention changes at three points in time. However, the researchers were not able to establish internal validity in the same manner for the vocabulary outcomes; vocabulary does not lend itself to parallel repeated measurement in baseline and treatment conditions, like continuous variables with a large range. As a result, Spencer et al. only examined a change on the vocabulary measure before and after the treatment condition, without a control group. Thus, more research using stronger research designs is needed to contribute to emerging evidence that the promotion of vocabulary can be integrated effectively with oral storytelling activities.

    The Current Study

    In this article, we report on the outcomes of a narrative intervention (i.e., Story Champs; Spencer & Petersen, 2016) that was designed for both teachers and/or SLPs to use as part of schools' MTSS efforts to address the diverse oral language needs of students with and without disabilities efficiently. Story Champs maximizes differentiated instruction and prepares diverse students for success in the general education curriculum (Kelley & Spencer, 2021; Petersen et al., 2022) by providing students repeated opportunities to retell carefully constructed stories that are densely packed with the language expected in school (Spencer & Petersen, 2018; Spencer & Pierce, 2023). When stories for use within a narrative intervention are crafted with challenging vocabulary (Lee et al., 2017), complex sentence structures (Petersen et al., 2014), and essential discourse elements (Miller et al., 2018), a vast array of academic language targets can be addressed, thereby ensuring the intervention benefits all students (Weddle et al., 2016).

    Of the available, evidence-based narrative intervention approaches (see Pico et al., 2021, for a review), the Story Champs model (Spencer & Petersen, 2016) is unique in that it makes strategic use of large numbers of specially constructed stories. Although vocabulary is often included in Story Champs interventions (Petersen et al., 2020), it has only been examined as an outcome measure with preschoolers and the maintenance of vocabulary improvements has never been investigated (Armon-Lotem et al., 2021; Spencer et al., 2019, 2020). Therefore, the purpose of the current study was to investigate the effect of vocabulary instruction embedded in a narrative intervention on the immediate and retained definitional knowledge of taught words for first graders at risk for language-related reading difficulties. Another way the current study addressed the gaps in the literature on vocabulary instruction embedded in a narrative intervention is the rigor with which we designed this study. We chose a less common, but suitable, research design (e.g., a repeated acquisition single-case design) to examine a causal link between the independent and dependent variables.

    We employed a valuable, yet underused, single-case research method called repeated acquisition design (RAD) to isolate the causal effects of the intervention on taught vocabulary. RAD has extensive practical utility because it can be integrated into schools' assessment routines, does not require lengthy baseline phases, and can be used with discrete skills that are acquired rapidly (Ferron et al., 2023; Kirby et al., 2021). Based on recommendations for increasing the rigor of RAD studies (Kirby et al., 2021; Ledford & Gast, 2018), we added numerous innovative features to the current study, such as control participants and retention probes. Moreover, we employed supplementary statistical methods compatible with single-case experimental designs to determine the significance and magnitude of any found effects (e.g., Edgington & Onghena, 2007; J. Ferron et al., 2020; Patrona et al., 2022; Van den Noortgate & Onghena, 2007).

    The following research questions were addressed in this study.

    1. To what extent does vocabulary instruction embedded in narrative intervention improve at-risk first graders' definitional knowledge of taught words?

    2. To what extent do at-risk first graders retain definitional knowledge of learned vocabulary words?


    Participant Selection Process

    Through a research–practice partnership with a Title I elementary school in a Southwest state, the first author collaborated with the school's principal, first-grade teachers, and SLP to design this study. The Northern Arizona University institutional review board approved the research. Parents provided informed consent and a demographic survey for each participant.

    In alignment with the school's MTSS beginning of the year universal screening, all first-grade students whose parents granted permission to participate were screened for inclusion in the study using the Narrative Language Measures (NLM) Listening subtest of the CUBED Assessment (Petersen & Spencer, 2016). The NLM Listening, a criterion-referenced assessment with adequate validity and alternate form reliability (see Petersen & Spencer, 2012; 2016), was designed for universal screening and progress monitoring. It contains a set of academically complex stories for use at each grade (PreK to third grade) through which a story retell language sample is elicited. Following the retell, students are asked questions regarding the definition of three challenging words embedded in the story. This is an inferential task because the words in the NLM Listening stories are meant to be unfamiliar and untaught. Nine first-grade NLM Listening forms are designated for universal screening (beginning, middle, and end of year), and the three allocated for fall benchmarking were administered by trained research assistants (RAs).

    To administer the NLM Listening, RAs brought students to a desk and chairs in the hall near the first-grade classrooms. They used the standardized administration procedures to elicit the three retell language samples and asked three vocabulary questions per story (about 5–7 min). For example, the RA said, “I'm going to tell you a story. Please listen carefully. When I'm done, you are going to tell me the same story. Are you ready?” After reading the story with normal inflection and moderate pace, the RA said, “Thanks for listening. Now you tell me that story.” When necessary, the RA used standardized but neutral prompts to encourage students to continue retelling (i.e., “It's OK. Just do your best,” and/or “I can't help, but you can just tell the parts you remember”). When the student appeared to be done retelling, the RA said, “Are you finished?” Although the assessment session was audio-recorded for examining interrater reliability, the RA scored the student's sample in real time, as they retold the story. Scoring involved the evaluation of the inclusion and clarity of story grammar elements (0–2 points) and the frequency of words that signal a complex sentence (i.e., because, when, after). Stories that included the key story grammar elements of problem, attempt, and consequence/ending earn bonus points because these are the most critical plot elements. As a language sample, a finite set of points are not possible, but a score of 15 is considered appropriate for first graders (Petersen & Spencer, 2016).

    After the student retold the story, the RA asked them to define three words that were embedded in the story with contextual clues. The RA repeated the clue in the question (e.g., “Kya was allergic to the plants. She couldn't be around them. What does allergic mean?”). The standardized prompt of “What else does ____ mean?” was used if a student defined the word using the words in the clue. When students provided a clear and correct definition of the word, the item earned a score of 3. When students provided an unclear but generally correct definition or an example of the word, the response earned a score of 2. If scores of 2 or 3 were not awarded for an item, the RA asked a choice-of-two question such as, “Does allergic mean to get sick from something or to be scared of something?” A correct response to this question earned 1 point. This section had a total of 9 points.

    Because three NLM Listening forms were administered in a single session, students' highest retell scores were compared to the fall benchmark score of 10. Students who earned 0–9 points on their highest retell were considered at risk for language-related reading difficulties and were selected as research participants. Because the NLM Listening does not have benchmark scores for the vocabulary questions section, it could not be used for identification. However, research participants had an average vocabulary score of 4.27 out of 9.

    Twenty-two first graders scored below the fall cut score and were designated to receive a Tier 2 supplemental language intervention. Half of the students were randomly assigned to receive treatment (n = 11) in the first 12 weeks of the study, while the other half were assigned to a waitlist. Of the 11 students assigned to the waitlist condition, three were randomly selected to serve as counterfactual comparisons in the single-case research design. Based on the parent report on the demographic survey, two students in the treatment group had Individualized Education Programs for speech-language disabilities, and two students spoke a language in addition to English at home (one Arabic and one Navajo). Regardless of disability status or language proficiency, students who perform below the expected benchmark on the NLM Listening are considered at risk for language-related reading difficulty (Petersen & Spencer, 2016). Because this study took place in the context of schools' MTSS efforts, we characterized all students who qualified for the Tier 2 oral language intervention as “students at risk for language-related reading difficulties.” Demographics are displayed in Table 1.

    Table 1. Participant demographics.

    Characteristic Treatment
    n % n %
     Female 3 27 0 0
     Male 8 73 3 100
    Race and ethnicity
     Native American 6 55 2 67
     White, non-Hispanic 4 36 0 0
     White, Hispanic 1 9 1 33
    Home language(s)
     Arabic/English 1 9 0 0
     English 9 82 2 66
     Navajo/English 1 9 1 33
    Disability 2 18 0 0

    Note. Average participant age was 6;5 (SD = 2.41), and participant age did not significantly differ by condition.

    Experimental Design and Procedure

    To address the immediate impact of intervention on multiple academic word sets (Research Question 1), a concurrent RAD across 14 participants was used. Although multiple baseline and withdrawal designs are more commonly found in single-case research, RAD is most appropriate for measuring the rapid acquisition of nonreversable discrete skills (Kirby et al., 2021). In addition, the setting and schedule of the school's routines made RAD appealing because of its “limited pre-instruction testing and relative speed” (Ledford & Gast, 2018, p. 350). In the 12-week intervention phase, intervention sessions were delivered daily, Mondays to Thursdays, and pre- and postprobes were conducted every Friday. Except for the first and last week, every Friday, a postprobe of the word set targeted during intervention in the preceding week was conducted, followed by a preprobe of the word set to be targeted in the following week. This RAD arrangement allowed for many within-subject replications of immediate acquisition effect (12 pre-to-post changes per participant × 11 participants = 132 replication opportunities), while closely matching authentic classroom activities where a finite set of vocabulary words are targeted for instruction each week. In summary, the RAD allowed the researchers to measure acquisition of discrete, nonreversable behaviors (i.e., definitions of words) and monitor retention of words, while considering feasibility of its use in practice with the intended population (Greenwood et al., 2016; Kirby et al., 2021).

    Research Team

    University students served as RAs for this project. One RA was an elementary education major in her final year before becoming a teacher, and five were completing clinical master's degrees in education. Although it would have been better to have in-service teachers involved in the research activities, as part of the ongoing research–practice partnership, university students were placed at the local schools to help support their MTSS efforts. Therefore, the RAs completed the screening of participants and served as interventionists for the students assigned to receive the treatment. Two of the RAs were male, and four were female. All students were White, with four RAs identified as non-Hispanic. The first author provided rigorous training on the NLM Listening and on the intervention prior to approving the RAs to work on this project. For example, following a 2-hr didactic workshop, RAs practiced the intervention and assessment procedures with first-grade students who were not part of this study. Each RA demonstrated all the procedures with 100% fidelity while the first author observed prior to working with the research participants. Once the intervention phase began, one of the RAs was solely responsible for collecting the pre- and postprobes on Fridays.

    Data Collection

    For the weekly pre- and postprobes in the RAD, researchers created sets of definitional vocabulary questions that were specific to the words taught each week in the academic language intervention. Because the intervention lasted 12 weeks and four words were targeted each week, 12 sets of definitional questions were needed to measure students' immediate response to intervention. The format of these questions was modeled after the vocabulary section of the NLM Listening subtest (Petersen & Spencer, 2016). In the NLM Listening, the featured words are intended to be unfamiliar and untaught, whereas for this study, we needed to measure students' definitional knowledge of the words explicitly taught during intervention. The vocabulary probes started with a definitional question while supplying a small clue (e.g., “Maya was tidying her bedroom. It was a mess. What does tidy mean?”) and continued to a choice of two as needed (e.g., “Does tidy mean to clean or to play?”). The words were taught during the previous 4 days of intervention, but students had no prior exposure to clues and contexts of the questions. Each probe, which involved the assessment of four words, took approximately 4 min to administer and score.

    Scoring for this measure was also modeled after that used in the NLM Listening vocabulary assessment. Three points were awarded when a student provided a correct definition or synonym. Only 2 points were awarded when the student's provided definition was unclear or in the form of an example, like saying, “You do it before you can play” when asked, “What does sanitize mean?” When a student was unable to earn 2 or 3 points for their definition, the examiner offered a choice of two possible answers (e.g., “Does radiant mean chilly or bright?”). One point was given for a correct definition, and 0 points was given for an incorrect definition. With each word assigned a score of 0–3, the total points possible for each pre- and postprobe was 12. All vocabulary probe sessions were audio-recorded to examine interrater reliability. In 10 of the 12 weeks of probes, an independent RA randomly selected four students' recorded probe responses to rescore (about 24% of all probes). Mean agreement between the two scorers was 100%.

    Students' performance on weekly pre- and postprobes were graphed according to RAD conventions (see Figure 1). The same researcher-made assessments were used to measure retention 1 week following the completion of the 12-week treatment phase to address Research Question 2. Students' ability to define the 48 targeted words was assessed in four sessions across 2 days so that students were only asked to define 12 words in each session.

    Figure 1.

    Figure 1. Repeated acquisition design results with 11 treatment and three control students.


    To examine the extent to which vocabulary instruction embedded in a narrative intervention was an effective approach for increasing vocabulary knowledge, we had to create 24 specially constructed stories. Story construction followed the formulas used to develop earlier versions of the Story Champs program (Spencer & Petersen, 2016), but aligned with Lee et al.'s (2017) suggestions for embedding target words in stories purposefully and with context clues. Stories contained 100 words, all the main story grammar elements appropriate for first-grade students (i.e., character, setting, problem, feeling, action, ending, and end feeling), two relative clauses, and four subordinate clauses. Each story was based on a child-relevant theme such as getting car sick, dropping heavy books, and feeling lost, and included two academic vocabulary words (e.g., inspect, frigid, lug, hazardous) and at least one clue or synonym for each of the target words (e.g., “She bundled up because it was cold outside”). Target words were drawn from lists of general academic words related to grade-level academic expectations (e.g., Marzano & Simms, 2013).

    Since we crafted 24 new stories for this early efficacy study, we had to create new visual supports and precise lessons for teaching vocabulary. Therefore, the materials used in this study include (a) a teacher book containing semiscripted lessons; (b) a set of illustrations (five panels representing the main parts of the story) for each of the new stories; (c) a picture book containing photos that depicted the target words; (d) colorful circle-shaped icons from Story Champs that represent character, setting, problem, feeling, action, ending, and end feeling; (e) story games from Story Champs including story bingo, story cubes, and story sticks; and (f) miscellaneous materials for game-like extension activities (e.g., vocabulary bingo cards, balls for tossing, sticks).


    During the intervention phase, all the research activities (intervention and weekly probes) took place in a multipurpose room within the school. The 11 treatment participants were divided into four groups of two to three students and remained with the group for the entire 12-week intervention phase. Two RAs were responsible for delivering the group's interventions on alternating days. Together, they conducted a total of 48 sessions (4×/week), 24 lessons (2×/week), each with two strands (both strands of the same lesson featured the same story), and taught a total of 48 general academic vocabulary words. Lessons were semiscripted in that some of the activities included what the interventionist should say and what response is expected from the students. The scripts also included images of the materials to be used in each activity and examples of high-quality corrections to be used when needed. Some of the activities could not be fully scripted because they featured the students doing most of the talking (e.g., during team or individual retells or person story generations). For those activities, only general steps and prompting procedures were included in the lesson plans.

    Lessons were divided into two strands, designed to be delivered on consecutive days. In Strand 1, the focus was on the introduction of two target words and retelling the model story. The purpose of Strand 2 was to extend the practice of the target words to additional contexts and personal story generation activities. The construction of these lessons was based on a vast literature about how to teach vocabulary (Baumann & Kame'enui, 2004; Beck et al., 2013; Wright & Cervetti, 2017). For example, recommended vocabulary teaching procedures include defining words, using synonyms, practicing words in multiple contexts, using sentence generation tasks, and employing games to engage students to use the words. In addition, increasing students' awareness and monitoring of their understanding of words may increase students' overall ability to use context for inferring the meaning of a word (Spencer et al., 2022).

    Strand 1 lessons, which took place on Mondays and Wednesdays, began with the interventionist reading the model story to the group as the illustrations were displayed on the table in front of the students. While reading the story, the interventionist placed story grammar icons on or near the corresponding illustrations. The interventionist named the story parts and had the students repeat the names. In the next step, the interventionist reread the story while students listened for words they did not know. When an unfamiliar word was read, they raised their hands. All the words, except for the target words, were easy and familiar to young children, so this activity was designed to practice comprehension monitoring. If students did not raise their hand when a target word was read, the interventionists raised their hand. When an unfamiliar word was identified (i.e., target word), the interventionists restated the word and had the students repeat the word. Then, the interventionist said, “Let's see if we can figure out what the new words mean by listening for clues.” For each of the unfamiliar words, the interventionist read the sentence with the target word and the sentence with the clue. According to the script, the interventionist said, “Think about the clues I just read and see if you can figure out what (word) means.” If a student generated a correct definition, the interventionist restated the definition and had the group repeat it. If students were unable to figure out the meaning of the target word, the interventionist demonstrated how to use the clues using a talk-aloud procedure (e.g., “Hmmm. Here is says it is cold outside and here it says that it was a frigid day. I wonder if frigid could mean cold. Let's see if that works. It was a cold day”). At the end of the talk-aloud, the interventionist asked the students to say the target word and the definition following their model. Then asked, “Everyone, what does (word) mean?” A standardized correction procedure was provided as needed: “(Definition). Listen. (Word) means (definition). Say it with me, (word) means (definition). Great! What does (word) mean?” This clue-finding procedure was completed with both words before continuing to the next steps of the lesson.

    In the subsequent steps of Strand 1 lessons, students were encouraged and prompted to use the target words when there was opportunity to do so. Before students took turns retelling the entire story individually, the interventionist led them through a team retell step. The interventionist distributed one or two story grammar icons randomly to each student. Beginning with the person with the character, they retold the part of the story their icon represented. Once the individual retold the part, the interventionist had the whole group repeat the sentence. This individual then group responding sequence repeated until all of the parts of the story had been retold by the “team.”

    The visual material was systematically faded during the individual retell steps. For example, illustrations and icons were available when the first student retold the whole story individually, but only icons were available when the second student retold the story, and no illustrations or icons were available when the last student retold the story. During this step, students played active responding games (e.g., story bingo, story cubes, and story sticks) as they listened to the storyteller retell the story. In the final step of Strand 1 lessons, the interventionist showed the photos in the picture book. For each word, students took turns describing a photo using a complete sentence that included one of the target words. Because sample sentences were included in the teacher book, if students struggled to generate a sentence, the interventionist could model a sentence and have the student repeat it. For each sentence generated by an individual student, the entire group repeated the sentence so that they also benefitted from the practice using the new words.

    On the subsequent days (Tuesdays and Thursdays), Strand 2 lessons were delivered. The lesson began with a short review of the words they learned on the previous day. Interventionists modeled the words and the definitions, having the students repeat the words and definitions. Next, the team retell step (see above) was completed to give additional practice with the target words and story retelling before transferring the practice to personal story generations. For this activity, each student took turns telling the group about a personal experience while including the target words in their stories. Interventionists prompted as needed (see below). In the final Strand 2 activity, students played a brief game that required them to interact with the meaning of the target words such as Go Fish, matching games, word bingo, and motor games like tag and trash can basketball.

    During word and storytelling activities in which students were talking (both Strands 1 and 2), interventionists followed a set of principles for standardizing their prompts. First, they were to deliver prompts and corrections immediately rather than waiting until the student's story was finished. Second, they were not supposed to criticize or mention what the student did wrong but focus on the things the student did well. Third, they were trained to use a standardized two-step prompt procedure. This involved first asking a question that directed the student to what they should have said (e.g., for story grammar = “What was Vera's problem?”; for complex sentence = “Try again, but this time use the connection word when”; for target word = “Say that sentence again, but this time use the new word frigid”). If the student was unable to generate the desired word/sentence, the interventionist followed the first prompt with a model sentence and a request to repeat it (e.g., for story grammar: “Vera built a snow fort, but it was too small. Now you say that”; for complex sentence: “Listen. When she had enough snow, Vera was able to build a bigger fort. Your turn to say that”; for target word: “Say it like me. One frigid morning, Vera went outside to play in the snow”). Using this two-step prompting procedure, interventionists ensured each student practiced the words 8–10 times per session.

    The first author conducted fidelity observations with each interventionist 6–9 times across the 12-week intervention phase. She used a 41-item fidelity checklist for evaluating adherence, quality, and student engagement during Strand 1 lessons and a 43-item checklist for Strand 2 lessons. Mean fidelity was generally high (98%; range: 83%–100%), so the results were combined across strands and interventionists. One interventionist had consistently lower fidelity scores (83%–94%), but they were sufficiently high to conclude the intervention was delivered as intended.

    Data Analysis

    Data from the RAD were analyzed using visual analysis of individual participant graphs. Visual analysis was supplemented by non-overlap effect indices, randomization tests, and two-level hierarchical linear modeling. These are described below.

    Visual analysis. Visual inspection of graphed data included within-participant examination of mean shifts, variability, and changes in trend. The graphs of the treatment group participants were also compared to those in the control group, to rule out the null hypothesis of no treatment effect (Kirby et al., 2021). Graphed weekly pre- and postprobe scores allowed for 12 demonstrations of immediate treatment effect per participant. By comparing scores obtained at Week 13 to their corresponding weekly preprobe and postprobe scores of the same words, we documented retention of learned vocabulary words.

    Percent of goal obtained. We used percent of goal obtained (PoGO; Ferron et al., 2020; Parker et al., 2014), a single-case estimate of effect size that puts all raw score effects onto a common scale, so that researchers can quantify the effect of treatment for each participant in terms of the percent of progress made toward each case goal, γ. Based on the theory of change and use of the vocabulary scoring measure, γ was set as 12, the highest possible score for pre-, postprobe, and retention probe sessions. Obtained statistics were based on each participant's levels for weekly preprobe s (α), postprobes (β1), and retention probes (β2). The PoGO formula used was β α γ α × 100 to calculate PoGO for weekly lesson probe level ( A 1 B 1 ) and retention probe level ( A 1 B 2 ). To interpret PoGO results, an estimate of 0 would indicate no treatment effect (i.e., no improvement in targeted outcomes). A PoGO estimate of 100 would indicate a maximally effective treatment effect for a given participant.

    Randomization test. We used nonexhaustive upper-tailed randomization tests (α = .05) for both immediate probe gains and retention test differences to rule out the null hypothesis of no treatment effect (Edgington & Onghena, 2007). The obtained test statistic was based on participant mean gain differences and compared at the group level.

    Hierarchical linear modeling. A two-level hierarchical linear modeling was developed using design and analysis recommendations for single-case research (Rindskopf & Ferron, 2014; Van den Noortgate & Onghena, 2007) and a conceptual model of mean gain scores on vocabulary probe outcomes nested within participants. The academic language intervention was hypothesized to result in large, immediate, and retained gains in vocabulary outcomes. Assuming autocorrelation, we used restricted maximum likelihood estimation and the Kenward–Roger method of inference (Kenward & Roger, 1997). Results were reported using empirical Bayes (EB) estimates. To examine the impact of the academic language intervention on vocabulary skills while controlling for student group assignment, we specified the Level 1 and Level 2 models as follows:

    Level 1 : Y vocab = π 0 i + π 1 i Time ti + e ti Level 2 : π 0 i = β 00 + β 01 Treatment i + r 0 i π 1 i = β 11 Treatment i (1)


    Immediate Effects of Embedded Vocabulary Instruction

    Graphed results of weekly pre- and postprobe scores for 11 treatment and three control students are presented in Figure 1. Visual inspection of graphs revealed immediate positive and large level changes for treatment participants, averaging a gain of more than 6 points each week. There was no visible difference between weekly probe scores for control participants. Immediate within-subject treatment effects were seen at 12 points in time for most participants. Only one treatment participant, Asher, was absent for Week 11's postprobe assessment session, resulting in only 11 opportunities for replication. Layla and Calvin were absent for at least three sessions during the weeks indicated on the graphs but were present for all pre- and postprobe assessment sessions.

    Consistent with visual inspection of the participants' graphs, supplemental statistics also provided evidence of a causal relation between intervention and explicitly taught vocabulary outcome, reinforcing the positive effects noted in the graphs for students who received the 12-session academic language intervention. Means and standard deviations of outcome variables are reported for each participant in Table 2. The presence of a treatment effect was confirmed with a randomization test that revealed a statistically significant difference between treatment and control participants on weekly probe gains (p = .0027). The size of the effect was estimated using two methods: the percentage of the goal obtained (PoGO) and raw score vocabulary gains using a multilevel model. Shown in Table 2, the average PoGO value for treatment participants was 76.0, considered a moderately large treatment effect (Ferron et al., 2020). Gio reported the largest PoGO value of 94.2, indicating that he progressed 94.2% of the way to meeting the maximum score allowed. In comparison, negative PoGO values were found for control participants.

    Table 2. Vocabulary means, standard deviations, and individual effect estimates.

    Group Participant Probe means (SD)
    Pre-α Post-β1 Retention β2 Goal γ Pre–post Preretention Pre–post Preretention
    Treatment Charlotte 3.6 (1.3) 10.8 (1.2) 8.6 (1.8) 12 86.1 59.4 7.92 6.97
    Amaria 2.8 (1.8) 9.8 (2.1) 6.9 (2.2) 12 76.3 44.6 7.73 6.34
    Samuel 2.4 (1.0) 8.4 (2.8) 5.0 (2.7) 12 62.2 27.4 6.96 4.99
    Asher 1.8 (0.6) 8.9 (2.3) 6.3 (2.4) 12 70.0 43.9 7.86 6.63
    Calvin 3.2 (1.3) 7.3 (3.2) 4.7 (2.8) 12 46.2 17.0 5.36 4.55
    Laylaa 2.7 (0.8) 9.1 (3.0) 6.1 (2.9) 12 68.7 36.6 7.30 5.88
    Jordynn 3.9 (2.4) 9.1 (2.4) 7.3 (2.7) 12 63.9 42.2 6.29 5.88
    David 3.2 (2.0) 11.1 (1.0) 7.4 (2.0) 12 89.6 48.1 8.51 6.45
    Gio 4.8 (2.0) 11.6 (0.8) 9.1 (2.1) 12 94.2 59.7 7.60 6.51
    Pedro 2.5 (1.4) 11.4 (0.8) 8.5 (2.7) 12 93.9 63.2 9.26 7.66
    Tyler 3.0 (1.0) 10.7 (1.6) 8.8 (2.1) 12 85.2 63.9 8.27 7.49
    Control Mateo 2.6 (0.9) 2.2 (1.3) 2.3 (1.1) 12 −4.4 −3.5 −0.25 0.10
    Dylan 2.5 (0.8) 1.6 (1.4) 2.2 (0.8) 12 −9.4 −3.5 0.17 0.10
    Santiago 3.8 (1.1) 1.9 (1.6) 3.1 (1.2) 12 −23.4 −9.2 0.09 −0.19

    Note. No significant difference in individual fixed effects estimates from group averages. PoGO = percent of goal obtained; EB = empirical Bayes.

    aStudent with a language disability.

    Randomization tests also answered our first research question, finding the vocabulary instruction embedded within the narrative intervention effective for improving definitional knowledge of a new set of words each week (p = .0027). A two-level hierarchical linear model provided further estimates of individual and group effects size estimates. As shown in Table 2, the EB estimates of weekly probe effects reported mean shifts for treatment participants ranging from a low of 5.36 (Calvin) to 9.26 (Pedro) compared to near-zero mean level gains for control participants. The magnitude of treatment effect was large and educationally significant (β = 7.56, p < .0001), as students who received the vocabulary instruction embedded within the narrative intervention improved upon their overall vocabulary preprobe score by 5.36–9.26 points at postprobe each week. Control group students neither demonstrated a consistent nor significant pattern of change. Rather, treatment was a significant predictor of gains, as the immediate average level change between weekly pre- and postprobes was 7.56 (p < .0001), with a 95% confidence interval (CI) [5.79, 9.34] that allows us to confidently conclude that the treatment group outperformed the control group. Average group effects for taught vocabulary are presented in Table 3.

    Table 3. Two-level linear model effect estimates.

    Analysis Weekly vocabulary probes
    Retention probes
    Fixed effects Estimate SE 95% CI p Estimate SE 95% CI p
    Intercept −0.13 .70 −1.66 1.39 .85 −0.47 .73 −2.05 1.11 .53
    Treatment effecta 7.56 .83 5.79 9.34 < .0001 6.43 .89 4.55 8.31 < .0001
    Treatment slopeb 0.12 .05 0.03 0.22 .01 0.35 .06 0.23 0.47 < .0001

    Within-case random effects



    z value




    z value

    Intercept 1.19 .60 1.97 .02 1.10 .66 1.66 .05
    Residual 4.91 .57 8.60 < .0001 5.94 .69 8.64 < .0001

    Note. Time centered at end of study; weekly probes AR(1) = −0.20, SE = .08, p = .02 and retention probes AR(1) = −0.005, SE = .095, p = .96. SE = standard error; CI = confidence interval; LL = lower limit; UL = upper limit.

    aDifference between dummy coded control (0) and treatment (1) groups.

    bSlope of treatment group.

    Retained Definitional Knowledge of Learned Vocabulary

    Using the RAD participant graphs in Figure 1, we examined the extent to which students maintained acquired vocabulary knowledge over time by comparing the vocabulary retention probe scores that occurred in Week 13 to their corresponding weekly preprobes and postprobes. Within-subject comparison of preprobe to postprobe, and postprobe to retention probes of definitions was done through visual inspection. With few exceptions related to absences, treatment participants showed consistently higher retention probe scores than preprobe scores.

    In addition to retention scores for individual participants displayed by week to correspond to word sets (see Figure 1, closed circles), multiple demonstrations of effect can be seen through comparison of mean scores for treatment and control groups across the 12 weeks (see Figure 2). Supporting the conclusion that the intervention had a positive effect on overall skill maintenance, graphed posttest scores remained above the preprobe scores on average, even with greater variability of gains for some students in the treatment group (e.g., Asher; see Figure 1). Data from students in the control group revealed negligible, if any, changes in level. Unlike the treatment group, there was considerable overlap of mean preprobe, postprobe, and retention probe scores for the control group, suggesting limited growth in vocabulary knowledge.

    Figure 2.

    Figure 2. Means of preprobe, postprobe, and retention probe vocabulary scores by weeks.

    Supplementing results of visual inspection, a randomization test ruled out retention probe results occurring by chance (p = .0032). As shown in Table 2, with the maximum value of 12 set as the goal, individual participant PoGO values were examined to quantify the size of the treatment effect at follow-up. The treatment group had an average PoGO value of 46.0, a medium effect (Ferron et al., 2020). Their PoGO values ranged from 17 to 63.9, consistent with visual analyses that showed greater variation in retention effects among treatment group participants. Interestingly, the two students identified with language disabilities (Amari and Layla) who were assigned to the treatment group condition also appeared to maintain most of their learned skills over time, even though they did not receive instruction in nor exposure to the taught words after they were introduced in a particular week. In fact, both Layla and Amari scored a little less than 3 points from their mean immediate pre–postprobe gain on average. Control group students had negative PoGO effects, meaning that their vocabulary knowledge at the time of the retention probe was scored lower than their preprobe outcome on average.

    Using multilevel modeling, we examined individual and group differences on retention of vocabulary knowledge. At Week 13, all treatment participants demonstrated partial maintenance of definitional knowledge with individual EB estimates ranging from 4.55 to 7.66, reported in Table 2. Parameter estimates shown in Table 3 provide information about average effects across cases for retention outcomes. The average effect of the academic language intervention on retention scores was statistically significant and large, β = 6.43, p < .0001, 95% CI [4.55, 8.31]. There was also an increasing trend over time for the treatment group participants only (see Figure 2). Included in the Level 2 model following the visual inspection of participant graphs, the average change in slope was positive and small at .35, p < .0001, 95% CI [0.23, 0.47]. There was no significant difference between control retention probe and preprobe outcomes noted at the individual or group level. Individual EB estimates obtained by the multilevel model are shown in Table 2.


    The purpose of this study was to investigate the potential of teaching definitions of words during narrative intervention for students at risk of developing language-based reading difficulties, including students with identified disabilities and without. We employed an innovative single-case research design and statistical strategies to document the significance of the immediate and long-term impact of the intervention. Based on visual analysis, statistical significance, and effect size calculations, we conclude that the vocabulary instruction embedded in the Story Champs intervention produced dramatic improvements in students' knowledge of the targeted words. Moreover, treatment participants retained a substantial amount of definitional knowledge several weeks later.

    This study contributes to a small but growing number of narrative intervention studies that target the improvement of academic vocabulary in addition to discourse- and sentence-level complexity (Adlof et al., 2014; Armon-Lotem et al., 2021; Clarke et al., 2010; Gillam et al., 2014; Spencer et al., 2019, 2020). On average, these interventions lead to positive effects on vocabulary outcomes when implemented at least twice weekly for a period of 6 weeks, with each lesson or session lasting an average of 30 min. However, little was known about the extent to which students retained the specific vocabulary taught in any given week. In this study, words targeted earlier in the intervention were not reviewed, which is very similar to how vocabulary is often taught (new words related to units each week that may not build upon one another). By assessing definitional knowledge of all trained words at Week 13, we were able to demonstrate that students were able to maintain skills above baseline levels. Additionally, the RAD allowed us to show, for possibly the first time, that retention of words may not have been solely influenced by the time since the words were originally taught. More research is needed to examine the extent to which specific words are more resistant to learning loss. The documentation of retention of discrete academic language skills may inform educators and SLPs as to how and how often previously taught content should be reviewed. Retention can also be used to rule out a hypothesis that students will lose skills not practiced over a certain period of time (e.g., school break).

    As teachers and SLPs alike are asked to do more with less time, multitiered interventions that address more than one oral academic language skill at a time by a variety of education professionals will be necessary. Narrative intervention is a versatile approach to contextualized language intervention that allows SLPs and teachers alike to differentiate and tailor instructional arrangements for diverse students (Pico et al., 2021; Spencers & Petersen, 2020). For example, both Amari and Layla, the two students identified with language disabilities, appeared to greatly benefit from the embedded vocabulary instruction within the narrative intervention. Their pre-intervention probes were routinely low, indicating little to no definitional knowledge of the different vocabulary targeted during intervention each week, but the vocabulary instruction embedded in the narrative intervention led to definitional scores at or near the goal level. This is important, as it shows that the intervention can be delivered in a small group by a classroom teacher or special educator, rather than individually by an SLP. Contextualized multitiered oral academic language interventions may reduce time pressures from high caseload numbers, allowing SLPs to coach school team members in the delivery of interventions like Story Champs. This approach promotes the collaborative efforts of SLPs and teachers to prevent reading difficulties for more students.

    Implications of RAD, Other Enhancement Strategies, and Future Research

    Given natural confounds of time, history, and maturation in young student populations for research conducted in the classroom and similar educational settings, the use of the RAD in this study allowed researchers to investigate treatment effects via multiple analyses without placing any additional burden on participants or anyone else involved (e.g., SLPs, school administrators, teachers). Furthermore, the design more closely matches the real-world instruction and assessment contexts that Story Champs is delivered in, making it “attractive for practitioners, who want to get a first insight into the effect of a treatment” (Van den Noortgate & Onghena, 2007, p. 196). With 11 treatment and three control group participants, the RAD allowed the researchers to capitalize on the strengths of visual inspection of within-subject and between-subjects effects as well as statistical analyses because of the sheer number of observations available (12 pre-, post-, and retention measures for each of the 44 vocabulary words). Furthermore, confidence in the RAD results were strengthened by the inclusion of three comparison participants in the control group. To date, no known RAD studies investigating immediate and longitudinal effects have included control participants.

    Although this study was well controlled and, as a result, we can be confident in the causal link, there are additional considerations when interpreting the results of RAD studies and planning for future studies using this design. As with all single-case research designs, the goal is not typically external validity. This is not necessarily a weakness of an individual study, but it is important to address all types of validity issues with a sequences research agenda. In other words, future studies would employ a design that could establish external validity. For example, in the current study, we did not match the control participants but instead randomly assigned them to the condition. With such small samples, the random assignment is unlikely to be sufficient to suggest that these findings are generalizable beyond the sample population. A different study design that has similar controls for threats to internal validity, such as a large-scale randomized control trial, would potentially lead to the replication of the research findings from this study and provide stronger evidence that the intervention can impact vocabulary across a more diverse sample of children. The current data support the efficacy of the academic language intervention for these students, but replication studies are necessary to further identify for whom and to what extent the intervention works.

    Although pretest data were collected prior to the onset of any instruction, inclusion of a baseline condition (measuring vocabulary knowledge at the beginning and end of the week in absence of the intervention; see Kirby et al., 2021) is another strategy for strengthening RAD studies. An extended baseline condition with repeated acquisition paths would allow for another comparison of within-group effects for the treatment students, similar to the staggered baselines in a multiple baseline design. It is important to note that there are many ways in which RAD studies can be enhanced and that all of them are not required. It is necessary to use informed judgment about the relevant threats to internal validity and choose the enhancement strategies that are necessary to minimize the plausible threats (Kirby et al., 2021).

    Future research should also include more distal measures of vocabulary learning. In the current study, we only assessed students' definitional knowledge in contrived contexts. However, it would be useful to also examine the extent to which students used the vocabulary words explicitly taught in their spontaneous language. In addition, we did not examine students' depth of knowledge of the learned words. There are different degrees to which students can understand and use new vocabulary (Henriksen, 1999), and outcomes that measure the continuum of vocabulary acquisition, from partial to precise knowledge, depth of knowledge, and receptive to expressive use would be important to consider in future research.

    Clinical Implications and Contributions

    Educators have become particularly concerned with students' weak oral academic language, a problem that has been exacerbated by the residual effects of a global pandemic (Bowyer-Crane et al., 2021). The awareness that oral language is foundational to reading comprehension (Hoover & Gough, 1990) and the fact that oral language has not received sufficient attention in an MTSS context (Goldfeld et al., 2021; Pittman et al., 2020) have generated momentum for the development of multitiered interventions that focus on academic oral language development. A singular focus on systematic, explicit phonics instruction cannot possibly address the consistently poor reading comprehension performance that most students experience across the United States (U.S. Department of Education et al., 2022, Reading Assessment). Educators, including SLPs, need evidence-based interventions that can be easily incorporated across tiers of intervention that will improve oral academic language, including the acquisition of advanced vocabulary, which is particularly important for reading comprehension. The results of this study indicate that narrative-based contextualized language interventions, such as Story Champs, can have a significant impact on vocabulary acquisition for students who are at risk for reading comprehension difficulty.

    Author Contributions

    Trina D. Spencer: Conceptualization (Lead), Data curation (Equal), Investigation (Lead), Methodology (Equal), Project administration (Lead), Resources (Lead), Supervision (Lead), Visualization (Lead), Writing – original draft (Equal). Megan S. Kirby: Data curation (Equal), Formal analysis (Lead), Methodology (Equal), Visualization (Supporting), Writing – original draft (Equal). Douglas B. Petersen: Conceptualization (Supporting), Data curation (Supporting), Formal analysis (Supporting), Writing – original draft (Supporting).

    Data Availability Statement

    The data featured in this article are available at and can be accessed through this link:


    The researchers would like to express their gratitude to Kinsey Elementary School and the Flagstaff Unified School District for their partnership in this research.


    Author Notes

    Disclosure: Trina D. Spencer and Douglas B. Petersen co-developed the oral narrative language program implemented in this study. Although the version investigated in this study has not been commercialized, when it is, they will receive financial benefits from its sale. Megan S. Kirby has declared that no competing financial or nonfinancial interests existed at the time of publication.

    Correspondence Trina D. Spencer, who is now at The University of Kansas, Lawrence:

    Editor-in-Chief: Erinn H. Finke

    Editor: Jillian H. McCarthy

    Additional Resources