Discussion
In this review, we set out to update a previous systematic review by
Yorkston et al. (2003) describing behavioral interventions for respiratory/phonatory dysfunction in adults with neurodegenerative disease. In the 20 years since
Yorkston et al.'s (2003) systematic review, an additional 53 published works on this topic have been added. Historically, the evidence for respiratory/phonatory rehabilitation could be categorized as biofeedback, device-driven, LSVT LOUD, or “miscellaneous” (mainly group therapy), with no RCTs reported. Today, a large range of approaches have been added to the evidence base, such as EMST, singing, and computer-driven programs, as well as a variety of treatment modalities, including teletherapy. As one example, evidence for computer- and device-driven therapies (e.g., DAF, masking noise, amplification, EMST, MAST, SpeechVive) has increased since 2003, shifting from mainly case studies/series and one single-subject report to 11 well-designed quasi-experimental studies and one single-subject–design study. In addition, evidence for treatment in several different population groups—including cerebellar ataxia, myotonic dystrophy, ARSACS, Huntington's disease, MSA, Lewy body dementia, and spastic paraplegia—were added to the current review.
Our goal in completing this review was to examine the quality of evidence for behavioral treatments for respiratory/phonatory dysfunction, as opposed to assessing the effectiveness of such treatments. Regarding evidence quality, there was strong evidence in support of only one behavioral intervention: LSVT LOUD in people with PD (see
Table 1). It is noteworthy that no other treatment approach or population included in this review demonstrated more than limited evidence. It is important to recognize that a treatment with strong evidence does not necessarily reflect the most effective or most appropriate treatment option; rather, it reflects the approach that has undergone the most scientific examination. Evidence for a treatment in a particular group does not imply that it will always be effective for individuals in that group or that it will be effective for people with dysarthria associated with a different diagnosis than studied. Likewise, treatments with insufficient or limited evidence are not necessarily ineffective; rather, further scientific enquiry is necessary before robust conclusions about treatment effectiveness can be drawn. We hope that, by highlighting the absence of strong evidence for respiratory/phonatory interventions within neurodegenerative populations, future research efforts to examine their efficacy will be stimulated.
Where does this leave clinicians? Decisions about treatment selection, including evidence- versus theory-based practice, are outside of the scope of this review; however, readers are encouraged to utilize the principles of EBP (
Dollaghan, 2007;
Tonelli et al., 2012)—including clinical expertise, pathophysiologic reasoning, consideration of environment and system constraints, and an informed patient's preferences—to guide their efforts in the absence of strong published evidence. Readers are directed to
Tonelli et al. (2012) for a framework to aid the understanding of the role of research in clinical practice, as well as practical tools for clinical decision making.
This review has highlighted several areas where research in this field of dysarthria management could be significantly strengthened. A lack of robust, high-quality studies informing treatment of respiratory/phonological deficits in neurodegenerative dysarthria is evident. Of the 88 studies included in the current review, only five were RCTs (four of these were related to LSVT LOUD, and one was related to verbal cueing). The benefit of RCTs lies in the reduction of bias and, through rigorous experimental control, the ability to examine cause–effect relationships between treatments and outcomes. They are unique from any other experimental designs and considered to be the gold standard for driving meaningful change in practice (
Hariton & Locascio, 2018;
Tarnow-Mordi et al., 2017). It is acknowledged that RCTs require considerable resources and, in cases of rare disorders, are not always feasible to conduct. Furthermore, we understand the need to begin studying the effects of treatments using quasi-experimental designs to determine the potential impact of an expensive RCT. In response, we suggest that researchers consider employing alternative designs such as the (multiple) single-subject design (
Tate et al., 2014,
2016). Single-subject research offers a very high level of experimental control and is particularly suited to rare/unique patient populations.
Also related to study quality was a lack of standardization in outcome measurement (due, in part, to a lack of standardized tools available), resulting in an overreliance on subjective measures of intelligibility and speech naturalness. When studies use unique/unvalidated tools or operational definitions for outcome measurement, it is difficult to compare findings across studies and adequately measure treatment effectiveness. We support the suggestion made 20 years ago by
Yorkston et al. (2003) that a comprehensive set of measures with demonstrated psychometric properties be developed, with the goal of improving the ability to measure treatment effectiveness. We further recommend that researchers comply to the minimal data set recommendations and measures provided for disorders at the National Institutes of Health (like those for PD and amyotrophic lateral sclerosis [ALS] advocated for by the National Institute of Neurological Disorders and Stroke).
Furthermore, some of the outcomes reported in the studies reviewed here are not reliable or valid measures of speech production. For example, the validity of MPT as an index of respiratory or laryngeal function has not been established, despite its wide use. Furthermore, MPT is highly impacted by effort and cueing, and most studies do not detail how the task is completed. Additionally, jitter and shimmer have been shown to be less reliable than CPP due to issues of identifying vibratory periods in dysphonic voices (
Patel et al., 2018). We encourage researchers and clinicians to choose valid and reliable measures that have been shown to relate to respiratory and laryngeal function when indexing changes to function during speech production. There is also substantial work to be done in the development/validation of outcome measures in non-English speakers. As an example, a recent literature search by one of the authors (D.B.) revealed no validated tools for measuring intelligibility in Spanish speakers.
Measures of treatment effectiveness, such as effect sizes, were noticeably missing from most included studies. Of the 88 studies included in this review, only 10 included effect size estimates. Including effect size estimates helps to determine whether a statistically significant effect is clinically meaningful, informs sample size calculations for future studies, and facilitates comparison between different studies (
Aarts et al., 2014).
Effect size estimates are one aspect of best practice guidelines for scientific research that were not consistently followed in the studies included in this review. Other concerns about study reporting were noted with regard to recruitment methods, participant characteristics (e.g., dysarthria subtype, medical diagnosis), inclusion/exclusion criteria, and attrition (including number of patients screened to obtain sample). This may be due, in part, to overly restrictive word limits for scientific journals or oversights on behalf of the authors. Importantly, these reporting details are critical elements of scientific communication for treatment studies. They provide data to understand the application of research findings to clinical practice (e.g., Is a given treatment effective for people with a given dysarthria subtype?) and predicting treatment success/failure (e.g., Is attrition higher in those with dyskinesias, reduced cognition, or more severe disease?), to name but a few. One way to ensure such details are included is for authors to use the CONSORT flow diagram (
Butcher et al., 2022;
Schultz, 2010)—or, for non-RCTs, the TREND statement (
Des Jarlais, 2014)—to describe participant flow through studies.
As a result of the lack of clear data regarding participant flow through the study, it was hard to tell in some cases whether intent-to-treat principles were followed, and in others, it was clear that intent-to-treat was not followed. This is a substantial weakness in the treatment efficacy literature reviewed here. Intent-to-treat principles ensure that every participant who was allocated to a treatment is included in the analysis, using the last data point obtained from participants who leave the study at any point. This is critical since there may be systematic differences between participants who complete the study and those who do not. Only including people who complete the treatment paradigm results in an overestimate of the treatment effects.
Fewer than half of the studies included in this review described PROs. PROs provide information that is unique from standard clinical measures and reflect the value of an intervention to patients. While some tools—such as the VHI (
Jacobson et al., 1997)—specifically focus on voice-related outcomes, there are also tools to measure the global communicative impact of interventions with a focus beyond
impairment as per the International Classification of Functioning, Disability and Health, such as the Communicative Participation Item Bank (
Baylor et al., 2013). Global tools such as this may make a valuable addition to respiratory/phonatory intervention trials, to ensure that the impact of interventions is comprehensively examined, including from the patients' perspectives.
There is currently a lack of diversity in the patient populations served by the current literature. This review reflects research completed in predominantly native English speakers, with a disproportionate number of studies completed in people with PD and/or hypokinetic dysarthria, and dysarthria severity skewed toward the mild end of the spectrum. Part of the reason for this imbalance, as mentioned above, is the current lack of validated tools for measuring dysarthria outcomes in nonnative English speakers. This may also reflect a lack of diversity within the field, where native language listeners are needed to make ratings of native language speakers.
Another issue is participant recruitment, particularly for disorders that are rare, or for which increased dysarthria severity is related to a decrease in other areas—such as cognition and mobility—that become a barrier to research participation. Patient populations notably missing from this review include various types of motor neuron diseases (including ALS and primary lateral sclerosis), myasthenia gravis, various types of muscular dystrophy, and postpolio syndrome. Some of these, such as ALS, are represented in the literature with approaches such as AAC. However, recent evidence across both the limb and bulbar literature suggests that mild–moderate intensity exercise, undertaken in the early stages of the disease, may have beneficial effects (
Park et al., 2020;
Plowman et al., 2016,
2019). It is therefore worthwhile including populations such as this in future research efforts.
An additional issue was the infrequency with which participant characteristics were reported. Reporting of participants' sex, age, or dysarthria type was infrequent, despite growing evidence supporting the idea that neurological diseases are experienced differently between the biological sexes, for example,
Cerri et al. (2019). This was recently highlighted by the Parkinson's Foundation in the United States, which created a national agenda to identify research and management practices that better support the needs of women. We hope that, by highlighting the lack of heterogeneity in the research supporting the field of neurogenic dysarthria, future researchers might be encouraged to make efforts to contribute to a more diverse—including culturally and ethnically diverse—evidence base and develop tailored interventions that meet the distinct requirements of the patient populations we serve.
Based on the findings of this review, we have several recommendations for future treatment efficacy research in this field. First, it is important that replication of studies takes place by research groups unaffiliated with the design of the treatment paradigm. This not only increases the validity of a treatment but also determines if treatment effects are still possible in the hands of other clinicians. This has started to happen with the LSVT LOUD treatment, but many other published treatments in the field are supported by only a single scientific study. As noted by
Yorkston et al. (2003), partial replication studies are also immensely valuable, particularly when they seek to better define treatment parameters such as optimal timing, treatment dosage, termination, and the usefulness of prophylactic therapy. Furthermore, implementation studies are almost nonexistent in the dysarthria literature, so knowledge about the translation, feasibility, and impact of treatments in clinical practice is limited.
Second, there is an urgent need for research documenting the economics of speech rehabilitation. In the context of continuing funding cuts, this information will be crucial for guiding resource allocation decisions. Some examples of economic analyses that are needed are device-driven versus clinician-led therapy, individual versus group therapy, low- versus high-intensity therapy, in-person therapy versus telehealth, and comparing different treatment doses. Related to this is the potential refinement of existing treatment parameters. For example, are all elements of LSVT LOUD required for the treatment to be successful, or is there potential to systematically evaluate each component of the treatment based on principles of motor learning (
Kleim & Jones, 2008)? As noted by
Yorkston et al. (2003), do these principles apply to other dysarthrias/populations? These questions warrant further, systematic investigation.
Third, little is known about changes to respiration as a result of dysarthria treatment. Only a few studies have been conducted using gold-standard measures of respiration including respiratory kinematics, pausing characteristics, spirometry, and respiratory strength. The respiratory system is critical to the development of pressure for speech, and inefficiency in the respiratory system can impact effort, vocal intensity, and naturalness. More studies need to directly examine the function of the respiratory subsystem to ensure that our treatments do not exacerbate fatigue.
This review is not without limitations. We acknowledge that use of the PEDro scale in the context of very few RCTs is not ideal, as it was impossible for some well-designed studies to satisfy all scale items. It is also worth acknowledging that high PEDro scores do not necessarily mean that a given treatment is clinically useful or cost-effective. However, to our knowledge, a well-established alternative scale for evaluating treatment evidence has not yet been developed. Due to sparse and inconsistent reporting, study designs and methodologies were not always clear, leading to a relatively large number of initial discrepancies between our raters regarding inclusion/exclusion criteria. Furthermore, most of the studies included in this review used null hypothesis significance testing (i.e., p values) as the only measure of treatment effectiveness. Without information regarding statistical power or effect size estimates, we acknowledge that our analysis of these studies is flawed. In addition, owing to the large number of articles, this review excluded those focused on nondegenerative populations and populations requiring tracheostomy and/or any form of mechanical ventilation. There is a need for future work to investigate the state of the evidence base regarding these groups. Finally, as this review was limited to studies published in English, we cannot rule out the possibility that treatment evidence published in other languages was missed from this review.
In summary, this literature review reflects the expanding literature on the effects of treatment on respiratory/phonatory function in neurodegenerative diseases. The largest number of studies examined LSVT LOUD, potentially due to its long-standing presence in the field. It is important to remember that a large literature does not mean a treatment is effective for all people with a particular disorder or that a treatment will work with another disorder. Several reporting and methodological weaknesses were identified in the literature including lack of reporting of participant characteristics and flow through the study, lack of consistent outcomes, little to no research on disorders other than PD, a lack of RCTs and strong multiple single-subject designs, and a lack of reporting of effect size estimates. It is recommended that clinicians consider the research evidence and the physiologic impact of the disorder, along with clinical experience and patient preferences, following evidence-based practice guidelines in making treatment decisions.
Author Contributions
Sarah E. Perry: Formal analysis (Lead), Investigation (Lead), Validation (Lead), Visualization, Writing – original draft (Lead), Writing – reviewing & editing (Lead). Michelle Troche: Conceptualization (Lead), Data curation (Lead), Formal analysis (Lead), Investigation (Lead), Methodology (Lead), Project administration (Lead), Resources (Lead), Software (Lead), Supervision (Lead), Validation (Lead), Visualization (Lead), Writing – reviewing & editing (Lead). Jessica E. Huber: Conceptualization (Lead), Data curation (Lead), Formal analysis (Lead), Investigation (Lead), Methodology (Lead), Project administration (Lead), Resources, Supervision (Lead), Validation (Lead), Visualization (Lead), Writing – original draft (Lead), Writing – reviewing & editing (Lead). James Curtis: Data curation (Supporting), Investigation (Supporting), Software (Supporting), Writing – reviewing & editing (Supporting). Brianna Kiefer: Investigation (Supporting). Jordanna Sevitz: Investigation (Supporting), Writing – reviewing & editing (Supporting). Qiana Dennard: Data curation (Supporting), Investigation (Supporting). James Borders: Investigation (Supporting), Writing – reviewing & editing (Supporting). Jillian River Browy: Data curation (Supporting), Investigation (Supporting). Avery Dakin: Investigation (Supporting), Writing – reviewing & editing (Supporting). Victoria Gonzalez: Investigation (Supporting). Julianna Chapman: Investigation (Supporting). Tiffany Wu: Investigation (Supporting). Lily Katz: Investigation (Supporting). Deanna Britton: Conceptualization (Lead), Data curation (Lead), Formal analysis (Lead), Investigation (Lead), Methodology (Lead), Project administration (Lead), Resources (Lead), Supervision (Lead), Validation (Lead), Writing – reviewing & editing (Lead).