Participant Selection Process
Through a research–practice partnership with a Title I elementary school in a Southwest state, the first author collaborated with the school's principal, first-grade teachers, and SLP to design this study. The Northern Arizona University institutional review board approved the research. Parents provided informed consent and a demographic survey for each participant.
In alignment with the school's MTSS beginning of the year universal screening, all first-grade students whose parents granted permission to participate were screened for inclusion in the study using the Narrative Language Measures (NLM) Listening subtest of the CUBED Assessment (
Petersen & Spencer, 2016). The NLM Listening, a criterion-referenced assessment with adequate validity and alternate form reliability (see
Petersen & Spencer, 2012;
2016), was designed for universal screening and progress monitoring. It contains a set of academically complex stories for use at each grade (PreK to third grade) through which a story retell language sample is elicited. Following the retell, students are asked questions regarding the definition of three challenging words embedded in the story. This is an inferential task because the words in the NLM Listening stories are meant to be unfamiliar and untaught. Nine first-grade NLM Listening forms are designated for universal screening (beginning, middle, and end of year), and the three allocated for fall benchmarking were administered by trained research assistants (RAs).
To administer the NLM Listening, RAs brought students to a desk and chairs in the hall near the first-grade classrooms. They used the standardized administration procedures to elicit the three retell language samples and asked three vocabulary questions per story (about 5–7 min). For example, the RA said, “I'm going to tell you a story. Please listen carefully. When I'm done, you are going to tell me the same story. Are you ready?” After reading the story with normal inflection and moderate pace, the RA said, “Thanks for listening. Now you tell me that story.” When necessary, the RA used standardized but neutral prompts to encourage students to continue retelling (i.e., “It's OK. Just do your best,” and/or “I can't help, but you can just tell the parts you remember”). When the student appeared to be done retelling, the RA said, “Are you finished?” Although the assessment session was audio-recorded for examining interrater reliability, the RA scored the student's sample in real time, as they retold the story. Scoring involved the evaluation of the inclusion and clarity of story grammar elements (0–2 points) and the frequency of words that signal a complex sentence (i.e.,
because,
when,
after). Stories that included the key story grammar elements of problem, attempt, and consequence/ending earn bonus points because these are the most critical plot elements. As a language sample, a finite set of points are not possible, but a score of 15 is considered appropriate for first graders (
Petersen & Spencer, 2016).
After the student retold the story, the RA asked them to define three words that were embedded in the story with contextual clues. The RA repeated the clue in the question (e.g., “Kya was allergic to the plants. She couldn't be around them. What does allergic mean?”). The standardized prompt of “What else does ____ mean?” was used if a student defined the word using the words in the clue. When students provided a clear and correct definition of the word, the item earned a score of 3. When students provided an unclear but generally correct definition or an example of the word, the response earned a score of 2. If scores of 2 or 3 were not awarded for an item, the RA asked a choice-of-two question such as, “Does allergic mean to get sick from something or to be scared of something?” A correct response to this question earned 1 point. This section had a total of 9 points.
Because three NLM Listening forms were administered in a single session, students' highest retell scores were compared to the fall benchmark score of 10. Students who earned 0–9 points on their highest retell were considered at risk for language-related reading difficulties and were selected as research participants. Because the NLM Listening does not have benchmark scores for the vocabulary questions section, it could not be used for identification. However, research participants had an average vocabulary score of 4.27 out of 9.
Twenty-two first graders scored below the fall cut score and were designated to receive a Tier 2 supplemental language intervention. Half of the students were randomly assigned to receive treatment (
n = 11) in the first 12 weeks of the study, while the other half were assigned to a waitlist. Of the 11 students assigned to the waitlist condition, three were randomly selected to serve as counterfactual comparisons in the single-case research design. Based on the parent report on the demographic survey, two students in the treatment group had Individualized Education Programs for speech-language disabilities, and two students spoke a language in addition to English at home (one Arabic and one Navajo). Regardless of disability status or language proficiency, students who perform below the expected benchmark on the NLM Listening are considered at risk for language-related reading difficulty (
Petersen & Spencer, 2016). Because this study took place in the context of schools' MTSS efforts, we characterized all students who qualified for the Tier 2 oral language intervention as “students at risk for language-related reading difficulties.” Demographics are displayed in
Table 1.
Data Collection
For the weekly pre- and postprobes in the RAD, researchers created sets of definitional vocabulary questions that were specific to the words taught each week in the academic language intervention. Because the intervention lasted 12 weeks and four words were targeted each week, 12 sets of definitional questions were needed to measure students' immediate response to intervention. The format of these questions was modeled after the vocabulary section of the NLM Listening subtest (
Petersen & Spencer, 2016). In the NLM Listening, the featured words are intended to be unfamiliar and untaught, whereas for this study, we needed to measure students' definitional knowledge of the words explicitly taught during intervention. The vocabulary probes started with a definitional question while supplying a small clue (e.g., “Maya was
tidying her bedroom. It was a mess. What does
tidy mean?”) and continued to a choice of two as needed (e.g., “Does tidy mean
to clean or
to play?”). The words were taught during the previous 4 days of intervention, but students had no prior exposure to clues and contexts of the questions. Each probe, which involved the assessment of four words, took approximately 4 min to administer and score.
Scoring for this measure was also modeled after that used in the NLM Listening vocabulary assessment. Three points were awarded when a student provided a correct definition or synonym. Only 2 points were awarded when the student's provided definition was unclear or in the form of an example, like saying, “You do it before you can play” when asked, “What does sanitize mean?” When a student was unable to earn 2 or 3 points for their definition, the examiner offered a choice of two possible answers (e.g., “Does radiant mean chilly or bright?”). One point was given for a correct definition, and 0 points was given for an incorrect definition. With each word assigned a score of 0–3, the total points possible for each pre- and postprobe was 12. All vocabulary probe sessions were audio-recorded to examine interrater reliability. In 10 of the 12 weeks of probes, an independent RA randomly selected four students' recorded probe responses to rescore (about 24% of all probes). Mean agreement between the two scorers was 100%.
Students' performance on weekly pre- and postprobes were graphed according to RAD conventions (see
Figure 1). The same researcher-made assessments were used to measure retention 1 week following the completion of the 12-week treatment phase to address Research Question 2. Students' ability to define the 48 targeted words was assessed in four sessions across 2 days so that students were only asked to define 12 words in each session.
Materials
To examine the extent to which vocabulary instruction embedded in a narrative intervention was an effective approach for increasing vocabulary knowledge, we had to create 24 specially constructed stories. Story construction followed the formulas used to develop earlier versions of the
Story Champs program (
Spencer & Petersen, 2016), but aligned with
Lee et al.'s (2017) suggestions for embedding target words in stories purposefully and with context clues. Stories contained 100 words, all the main story grammar elements appropriate for first-grade students (i.e., character, setting, problem, feeling, action, ending, and end feeling), two relative clauses, and four subordinate clauses. Each story was based on a child-relevant theme such as getting car sick, dropping heavy books, and feeling lost, and included two academic vocabulary words (e.g.,
inspect, frigid, lug, hazardous) and at least one clue or synonym for each of the target words (e.g., “She bundled up because it was
cold outside”). Target words were drawn from lists of general academic words related to grade-level academic expectations (e.g.,
Marzano & Simms, 2013).
Since we crafted 24 new stories for this early efficacy study, we had to create new visual supports and precise lessons for teaching vocabulary. Therefore, the materials used in this study include (a) a teacher book containing semiscripted lessons; (b) a set of illustrations (five panels representing the main parts of the story) for each of the new stories; (c) a picture book containing photos that depicted the target words; (d) colorful circle-shaped icons from Story Champs that represent character, setting, problem, feeling, action, ending, and end feeling; (e) story games from Story Champs including story bingo, story cubes, and story sticks; and (f) miscellaneous materials for game-like extension activities (e.g., vocabulary bingo cards, balls for tossing, sticks).
Intervention
During the intervention phase, all the research activities (intervention and weekly probes) took place in a multipurpose room within the school. The 11 treatment participants were divided into four groups of two to three students and remained with the group for the entire 12-week intervention phase. Two RAs were responsible for delivering the group's interventions on alternating days. Together, they conducted a total of 48 sessions (4×/week), 24 lessons (2×/week), each with two strands (both strands of the same lesson featured the same story), and taught a total of 48 general academic vocabulary words. Lessons were semiscripted in that some of the activities included what the interventionist should say and what response is expected from the students. The scripts also included images of the materials to be used in each activity and examples of high-quality corrections to be used when needed. Some of the activities could not be fully scripted because they featured the students doing most of the talking (e.g., during team or individual retells or person story generations). For those activities, only general steps and prompting procedures were included in the lesson plans.
Lessons were divided into two strands, designed to be delivered on consecutive days. In Strand 1, the focus was on the introduction of two target words and retelling the model story. The purpose of Strand 2 was to extend the practice of the target words to additional contexts and personal story generation activities. The construction of these lessons was based on a vast literature about how to teach vocabulary (
Baumann & Kame'enui, 2004;
Beck et al., 2013;
Wright & Cervetti, 2017). For example, recommended vocabulary teaching procedures include defining words, using synonyms, practicing words in multiple contexts, using sentence generation tasks, and employing games to engage students to use the words. In addition, increasing students' awareness and monitoring of their understanding of words may increase students' overall ability to use context for inferring the meaning of a word (
Spencer et al., 2022).
Strand 1 lessons, which took place on Mondays and Wednesdays, began with the interventionist reading the model story to the group as the illustrations were displayed on the table in front of the students. While reading the story, the interventionist placed story grammar icons on or near the corresponding illustrations. The interventionist named the story parts and had the students repeat the names. In the next step, the interventionist reread the story while students listened for words they did not know. When an unfamiliar word was read, they raised their hands. All the words, except for the target words, were easy and familiar to young children, so this activity was designed to practice comprehension monitoring. If students did not raise their hand when a target word was read, the interventionists raised their hand. When an unfamiliar word was identified (i.e., target word), the interventionists restated the word and had the students repeat the word. Then, the interventionist said, “Let's see if we can figure out what the new words mean by listening for clues.” For each of the unfamiliar words, the interventionist read the sentence with the target word and the sentence with the clue. According to the script, the interventionist said, “Think about the clues I just read and see if you can figure out what (word) means.” If a student generated a correct definition, the interventionist restated the definition and had the group repeat it. If students were unable to figure out the meaning of the target word, the interventionist demonstrated how to use the clues using a talk-aloud procedure (e.g., “Hmmm. Here is says it is cold outside and here it says that it was a frigid day. I wonder if frigid could mean cold. Let's see if that works. It was a cold day”). At the end of the talk-aloud, the interventionist asked the students to say the target word and the definition following their model. Then asked, “Everyone, what does (word) mean?” A standardized correction procedure was provided as needed: “(Definition). Listen. (Word) means (definition). Say it with me, (word) means (definition). Great! What does (word) mean?” This clue-finding procedure was completed with both words before continuing to the next steps of the lesson.
In the subsequent steps of Strand 1 lessons, students were encouraged and prompted to use the target words when there was opportunity to do so. Before students took turns retelling the entire story individually, the interventionist led them through a team retell step. The interventionist distributed one or two story grammar icons randomly to each student. Beginning with the person with the character, they retold the part of the story their icon represented. Once the individual retold the part, the interventionist had the whole group repeat the sentence. This individual then group responding sequence repeated until all of the parts of the story had been retold by the “team.”
The visual material was systematically faded during the individual retell steps. For example, illustrations and icons were available when the first student retold the whole story individually, but only icons were available when the second student retold the story, and no illustrations or icons were available when the last student retold the story. During this step, students played active responding games (e.g., story bingo, story cubes, and story sticks) as they listened to the storyteller retell the story. In the final step of Strand 1 lessons, the interventionist showed the photos in the picture book. For each word, students took turns describing a photo using a complete sentence that included one of the target words. Because sample sentences were included in the teacher book, if students struggled to generate a sentence, the interventionist could model a sentence and have the student repeat it. For each sentence generated by an individual student, the entire group repeated the sentence so that they also benefitted from the practice using the new words.
On the subsequent days (Tuesdays and Thursdays), Strand 2 lessons were delivered. The lesson began with a short review of the words they learned on the previous day. Interventionists modeled the words and the definitions, having the students repeat the words and definitions. Next, the team retell step (see above) was completed to give additional practice with the target words and story retelling before transferring the practice to personal story generations. For this activity, each student took turns telling the group about a personal experience while including the target words in their stories. Interventionists prompted as needed (see below). In the final Strand 2 activity, students played a brief game that required them to interact with the meaning of the target words such as Go Fish, matching games, word bingo, and motor games like tag and trash can basketball.
During word and storytelling activities in which students were talking (both Strands 1 and 2), interventionists followed a set of principles for standardizing their prompts. First, they were to deliver prompts and corrections immediately rather than waiting until the student's story was finished. Second, they were not supposed to criticize or mention what the student did wrong but focus on the things the student did well. Third, they were trained to use a standardized two-step prompt procedure. This involved first asking a question that directed the student to what they should have said (e.g., for story grammar = “What was Vera's problem?”; for complex sentence = “Try again, but this time use the connection word when”; for target word = “Say that sentence again, but this time use the new word frigid”). If the student was unable to generate the desired word/sentence, the interventionist followed the first prompt with a model sentence and a request to repeat it (e.g., for story grammar: “Vera built a snow fort, but it was too small. Now you say that”; for complex sentence: “Listen. When she had enough snow, Vera was able to build a bigger fort. Your turn to say that”; for target word: “Say it like me. One frigid morning, Vera went outside to play in the snow”). Using this two-step prompting procedure, interventionists ensured each student practiced the words 8–10 times per session.
The first author conducted fidelity observations with each interventionist 6–9 times across the 12-week intervention phase. She used a 41-item fidelity checklist for evaluating adherence, quality, and student engagement during Strand 1 lessons and a 43-item checklist for Strand 2 lessons. Mean fidelity was generally high (98%; range: 83%–100%), so the results were combined across strands and interventionists. One interventionist had consistently lower fidelity scores (83%–94%), but they were sufficiently high to conclude the intervention was delivered as intended.
Data Analysis
Data from the RAD were analyzed using visual analysis of individual participant graphs. Visual analysis was supplemented by non-overlap effect indices, randomization tests, and two-level hierarchical linear modeling. These are described below.
Visual analysis. Visual inspection of graphed data included within-participant examination of mean shifts, variability, and changes in trend. The graphs of the treatment group participants were also compared to those in the control group, to rule out the null hypothesis of no treatment effect (
Kirby et al., 2021). Graphed weekly pre- and postprobe scores allowed for 12 demonstrations of immediate treatment effect per participant. By comparing scores obtained at Week 13 to their corresponding weekly preprobe and postprobe scores of the same words, we documented retention of learned vocabulary words.
Percent of goal obtained. We used percent of goal obtained (PoGO;
Ferron et al., 2020;
Parker et al., 2014), a single-case estimate of effect size that puts all raw score effects onto a common scale, so that researchers can quantify the effect of treatment for each participant in terms of the percent of progress made toward each case goal, γ. Based on the theory of change and use of the vocabulary scoring measure, γ was set as 12, the highest possible score for pre-, postprobe, and retention probe sessions. Obtained statistics were based on each participant's levels for weekly preprobe s (α), postprobes (β
1), and retention probes (β
2). The PoGO formula used was
× 100 to calculate PoGO for weekly lesson probe level (
) and retention probe level (
). To interpret PoGO results, an estimate of 0 would indicate no treatment effect (i.e., no improvement in targeted outcomes). A PoGO estimate of 100 would indicate a maximally effective treatment effect for a given participant.
Randomization test. We used nonexhaustive upper-tailed randomization tests (α = .05) for both immediate probe gains and retention test differences to rule out the null hypothesis of no treatment effect (
Edgington & Onghena, 2007). The obtained test statistic was based on participant mean gain differences and compared at the group level.
Hierarchical linear modeling. A two-level hierarchical linear modeling was developed using design and analysis recommendations for single-case research (
Rindskopf & Ferron, 2014;
Van den Noortgate & Onghena, 2007) and a conceptual model of mean gain scores on vocabulary probe outcomes nested within participants. The academic language intervention was hypothesized to result in large, immediate, and retained gains in vocabulary outcomes. Assuming autocorrelation, we used restricted maximum likelihood estimation and the Kenward–Roger method of inference (
Kenward & Roger, 1997). Results were reported using empirical Bayes (EB) estimates. To examine the impact of the academic language intervention on vocabulary skills while controlling for student group assignment, we specified the Level 1 and Level 2 models as follows: