Dysphagia is frequently described as involving impairments in two key functional aspects of swallowing, namely, swallowing safety and swallowing efficiency (
Clave et al., 2012;
Clave & Shaker, 2015). Measures of airway invasion are widely used to describe swallowing safety, and the entry of material into the airway is associated with increased risk for respiratory sequelae (
Lakshminarayan et al., 2010;
Martino et al., 2005;
Pikus et al., 2003;
Rofes et al., 2011;
Titsworth et al., 2013). Impaired swallowing efficiency, characterized by residue in the pharynx after a swallow, has received less attention (e.g.,
Molfenter & Steele, 2013;
Waito, Tabor-Gray, et al., 2018;
Waito et al., 2017). However, the presence of pharyngeal residue has been found to be a risk for subsequent aspiration (
Eisenhuber et al., 2002;
Molfenter & Steele, 2013), and some studies report an association with malnutrition (
Carrion et al., 2015;
Clave & Shaker, 2015;
Rofes et al., 2010). In order to better understand the links between pharyngeal residue and potential negative sequelae, it is essential that objective measures of residue be employed in research. However, there is currently a lack of consensus regarding preferred metrics for quantifying pharyngeal residue from videofluoroscopy recordings (e.g.,
Eisenhuber et al., 2002;
Han et al., 2001;
Hutcheson et al., 2017;
Leonard, 2017;
Logemann et al., 1989;
Martin-Harris et al., 2008;
Pearson et al., 2013;
Robbins et al., 2007;
Rommel et al., 2015;
Steele, Mukherjee, et al., 2019;
Steele, Peladeau-Pigeon, et al., 2019). Furthermore, thresholds for classifying pharyngeal residue into different degrees of severity, which may have utility in predicting the risk of negative outcomes, are yet to be established or validated. Thus, the definition, prevalence, and implications of residue of concern remain unclear. This technical report compares four different approaches to measuring pharyngeal residue. By applying these measures to an existing data set, we illustrate the different degrees of measurement reliability and precision that are seen and explore trends in the data that reflect concerns regarding validity.
Background
Table 1 lists several examples of different approaches for rating the severity of pharyngeal residue on lateral view videofluoroscopic images. These approaches can be broadly categorized as follows:
1.
visuoperceptual judgments of residue presence (vs. absence) in specific pharyngeal locations,
2.
visuoperceptual estimates of residue or bolus clearance as a proportion of the original bolus,
3.
visuoperceptual estimates of the degree to which a space (i.e., valleculae or pyriform sinuses) is full of residue, and
4.
quantitative pixel-based measurements of residue area.
A recent psychometric review concludes that visuoperceptual judgments of pharyngeal residue from videofluoroscopy recordings have reasonable overall quality and reliability (
Swan et al., 2019). However, methodological choices that may contribute to variability in these measures include (but are not limited to): the concentration of barium used in the experiment, that is, higher concentrations are more likely to coat the mucosa with the potential to be misidentified as residue (
Steele et al., 2013); procedural instructions regarding the selection of frames on which judgments are made (at the end of the initial swallow, the second swallow, etc.;
Pearson et al., 2013); operational definitions regarding the amount of residue needed to warrant a decision of “present”; and reference areas or dimensions that are used for scaling judgments of residue severity (see
Pearson et al., 2013, for several examples). Pixel-based measures are also vulnerable to these same sources of variability, but they have advantages over visuoperceptual judgments in that measurement rather than estimation should improve precision; similarly, they should be replicable and less prone to poor interrater agreement. Furthermore, pixel-based measures fall on a continuous interval scale, which may be better able to demonstrate small but clinically relevant degrees of change. For example, a recent treatment outcome study (
Steele et al., 2016) concluded that tongue pressure resistance training was effective for reducing vallecular residue, measured using the pixel-based Normalized Residue Ratio Scale (NRRS;
Pearson et al., 2013), whereas a previous study using a 3-point ordinal scale had failed to detect change (
Robbins et al., 2007). One acknowledged limitation of all two-dimensional (2D) lateral videofluoroscopic measures of pharyngeal residue is that they do not properly capture the three-dimensional (3D) nature of residue, including possible asymmetries. Fortunately, a recent comparison between pixel-based area measures on 2D lateral views from 3D computed tomography scans and corresponding volumetric measures has shown a very tight correspondence (
R 2 = .91;
Mulheren et al., 2019).
Objectives
The objective of this analysis was to compare four different approaches to evaluating pharyngeal residue from lateral view videofluoroscopic images:
Figure 1 provides an example image with pharyngeal residue seen in both the valleculae and pyriform sinuses, measured using each of these approaches.
In comparing these different approaches, our specific research questions were as follows:
1.
How well do these different measures and their subcomponents perform with respect to interrater reliability?
2.
What are the frequency distributions of pharyngeal residue according to these different measures in an example data set?
3.
What is the distribution of nonzero Eisenhuber scale scores for the valleculae and pyriform sinuses, relative to the corresponding
a. %-Full residue measurement scale?
b. NRRS measurement scale?
c. %(C2–4)2 measurement scale?
4.
In cases where nonzero Eisenhuber scale scores do not fall within the expected quartile of the %-Full distribution (e.g., a rating of 1 representing residue filling a space to less than 25% of its height would be expected to have a corresponding %-Full measure of < 25%), what proportion of scores within each Eisenhuber scale level are under- or overestimates?
5.
How strongly are the %-Full and %(C2–4)2 measures of residue severity correlated?
Strong positive correlations were expected across measurement methods. (Given that both the %-Full and the %(C2–4)2 measures are components of the equation for the NRRS, strong relationships with the NRRS can be presumed by definition and were not explored in this study.)
Method
Original Study Pharyngeal Residue Measurements
As part of the original study, videofluoroscopy recordings for each bolus were analyzed in duplicate by two trained raters, who were blinded to each other's ratings. Rating was completed according to a standard operating procedure, in which the determination of pharyngeal residue presence and severity involved three steps:
1.
identification of the frame of “swallow rest” for each swallow, defined as the first frame showing the pyriform sinuses at their lowest position, relative to the spine, as part of postswallow pharyngeal relaxation prior to onset of a subsequent swallow or nonswallow event;
2.
visuoperceptual judgment of residue severity in the valleculae and the pyriform sinuses on each swallow rest frame using the Eisenhuber scale (
Eisenhuber et al., 2002); and
3.
for cases where residue was judged to be present either in the valleculae and/or the pyriform sinuses (i.e., Eisenhuber scale scores > 0), pixel-based measurements of residue area and spatial housing area on the swallow rest frame, in order to yield %-Full measures for the valleculae and pyriform sinuses.
All pixel-based measures were performed using ImageJ software (
https://imagej.nih.gov/ij). Disagreement in Eisenhuber scale scores was operationally defined as any difference of at least one level, and for pixel-based measures, it was defined as any difference greater than 1.6 in the ratio of the absolute difference over the average value of the two provided ratings. Cases demonstrating disagreement according to these criteria were taken to a consensus meeting for remeasurement and resolution. Where rater differences did not require resolution, the smaller (i.e., more conservative) of the two rating values was taken as the rating of record. If the raters concurred that visualization of the structures necessary for a particular rating was obscured, the feature in question was documented as not ratable and became a missing data point. In total, this data set comprised recordings of 3,545 boluses with available residue measures for the valleculae and/or the pyriform sinuses.
Additional Data Processing for This Technical Report
Comparisons for this technical report were performed using measures from the swallow rest frame at the end of the initial swallow of each bolus. In addition to the measurements made in the initial study, for cases where pharyngeal residue was judged to be present, the length of the C2–4 cervical spine was measured (in pixels) on the initial swallow rest frame. This scalar reference measure enabled calculation of the NRRS and residue in %(C2–4)2 units. These measures were derived for the vallecular and pyriform sinus locations separately, and the %(C2–4)2 measures were added together for a composite “sum vallecular and pyriform sinus” measure.
Analyses
Of the 3,545 boluses in the data set, a total of 1,302 (37%) were judged to have residue present (i.e., nonzero Eisenhuber scale scores): 519/1,420 thin boluses (37%), 304/736 mildly thick boluses (41%), 246/701 moderately thick boluses (35%), and 233/688 extremely thick boluses (34%). Interrater reliability was calculated on initial ratings (prior to discrepancy resolution) using Kendall's τb for the ordinal Eisenhuber scale scores and intraclass correlations for all interval, pixel-based measures. Histograms were inspected to understand frequency distributions, and descriptive statistics were calculated for each continuous parameter (5th, 25th, median, 75th, and 95th percentiles). Comparisons across the different measurement methods were made with the nonzero residue cases only, as follows:
•
Eisenhuber scale scores were explored in relation to the pixel-based %-Full, NRRS, and %(C2–4)2 measures for the valleculae and pyriform sinuses using cross-tabulations, box plots, and Kendall's τb tests.
•
The accuracy of the Eisenhuber scale ratings for the valleculae and pyriform sinuses was evaluated by cross-tabulation with 25% increments of the %-Full measure (i.e., 1%–25% full, 26%–50% full, > 50% full).
•
Scatter plots and Spearman rank correlations were used to explore relationships between the %-Full and %(C2–4)2 measures.
Discussion
In this study, we used a retrospective analysis of an existing data set to illustrate differences between four approaches to measuring pharyngeal residue from lateral view videofluoroscopic images. Several important observations can be gleaned from this study. First, the analysis shows that good interrater agreement can be achieved with all four approaches to measurement. An important caveat to this observation is the fact that the methods in this study began by resolving any differences across raters in selection of the swallow rest frame for the initial swallow of each bolus; this procedural step removed differences in frame selection as a possible source of differences across raters. Although overall interrater agreement appears excellent, the data in
Table 2 show that interrater agreement was not as strong for pixel-based measures of spatial housing area. This is a concern, because measures of spatial housing form the denominator for the %-Full measure, and the %-Full measure is also used as a component in calculation of NRRS measures. Evidence that components of these measures may not have good reliability represents a challenge to the apparent reliability of the derived measures.
Second, this study raises additional concerns regarding the validity of the %-Full measure, which are apparent in
Figure 3a where measures involving the tracing of spatial housing area appear prone to inflating measures of residue severity compared to those using cervical spine reference scalars. The areas of the valleculae and pyriform sinuses may vary as a video recording moves from frame to frame, depending on the position of the epiglottis and the degree of pharyngeal relaxation. The data suggest that %-Full measures may inflate residue severity in cases where spatial housing appears relatively small or collapsed on a lateral view image.
Figures 5a and
5b illustrate this issue with two examples of vallecular residue. Additionally, it is acknowledged that the convention used in this study, along with others where spatial housing has been measured (
Molfenter & Steele, 2013;
Pearson et al., 2013;
Steele, Peladeau-Pigeon, et al., 2019;
Stokely et al., 2015;
Waito, Steele, et al., 2018;
Waito, Tabor-Gray, et al., 2018) has been to define the upper boundary of the vallecular spatial housing area using the tip of the epiglottis. In reality, the glosso-epiglottic folds that form the upper lip of the vallecular space are anatomically inferior to this location and are not always easily seen on a lateral view radiographic image. Similarly, it is challenging to know exactly where the upper boundary of the pyriform sinuses lies on a lateral view image.
Third, this study suggests that clinicians are reasonably good at judging degrees of residue severity using visuoperceptual judgments, showing modest associations between Eisenhuber scale scores and corresponding pixel-based measures (see
Figures 1a,
1b, and
1c). However, when the accuracy of Eisenhuber scale scores was compared to 25% increments of the %-Full measure, inaccuracies were common, with a trend toward overestimation of residue severity in the visuoperceptual ratings (see
Figures 4a and
4b). Given that previous studies also suggest that ordinal scales may lack sensitivity to changes in pharyngeal residue following dysphagia intervention (
Robbins et al., 2007), pixel-based methods of measurement are recommended in situations where greater measurement precision is desired, such as pre- versus posttreatment comparisons of residue severity.
For these reasons, we favor the %(C2–4)
2 measure, which showed excellent interrater reliability for all components and good precision with respect to rater differences (see
Table 2). This measure is very similar in construct to the pharyngeal residue ratio proposed by
Leonard (2017), in which pixel-based measures of residue area are expressed as a percentage of pharyngeal area at rest. Previous work from our lab suggests that measures of pharyngeal area at rest corresponds to 58% of the (C2–4)
2 area in healthy adults (
Steele, Peladeau-Pigeon, et al., 2019). However, it should be noted that the frames used for measurement of pharyngeal area at rest differ between the Leonard method and our work. Consequently, further studies to confirm the correspondence between the two measures will be needed.
The ability to sum residue measures across different pharyngeal locations for a composite representation of residue severity is an added advantage of the %(C2–4)
2 approach. In this study, residue measures were only taken from the valleculae and pyriform sinuses; however, residue in other pharyngeal locations, such as coating on the pharyngeal wall, could, in principle, also be measured in %(C2–4)
2 units and added to the sum vallecular and pyriform sinus measures for a total pharyngeal residue measure (
Steele, Peladeau-Pigeon, et al., 2019).
An important observation from the data used in this study is the fact that all measures of residue showed nonnormal distributions with positive skews. This means that comparisons of residue severity should use nonparametric statistics rather than models assuming normality. To date, the field lacks a clear definition of the degree of pharyngeal residue that should be identified as a finding of concern. It is interesting to note that the 75th percentile values for %(C2–4)
2 measures of residue in the data set used for this study (which comprised adults referred for videofluoroscopy due to suspected dysphagia) are higher than those found in a recently published study in healthy adults under the age of 60 years (
Steele, Peladeau-Pigeon, et al., 2019;
https://steeleswallowinglab.ca/srrl/wp-content/uploads/ASPEKT-Method-Reference-Value-Tables-V1.3.pdf). It is also interesting to note that the 75th percentile values for the vallecular NRRS measure in this study fall close to the 0.09 cut-point identified by
Molfenter and Steele (2013) as representing a risk for penetration–aspiration on a subsequent clearing swallow. Therefore, we propose that the 75th percentile or third quartile boundaries for pharyngeal residue measures in healthy adults represent a meaningful threshold to use as an index of concern in future research exploring the risks associated with pharyngeal residue. The data in this study suggest that vallecular residue is more common than pyriform sinus residue. Therefore, explorations of risk related to residue should include consideration of residue location.
As with any study, this one is not without limitations. It is important to emphasize that the analysis reported in this technical report focused on pharyngeal residue present at the end of the initial swallow for each bolus, such that patterns within individual patients across higher order swallows within boluses or across repeated boluses, either within or across consistencies, have not been taken into consideration in the statistical analyses. Additionally, due to the fact that very limited etiological information was available about participants in the data set, the analysis represents aggregate information for a heterogeneous sample with no history of oncological, structural, or congenital dysphagia but without stratification by diagnosis. Perhaps the most important limitations to note from a clinical perspective are those related to instrumental or research design constraints. All measures of residue severity were taken from 2D lateral view videofluoroscopic images and therefore are unable to capture asymmetries that may exist in the 3D volumetric reality of residue. However, as mentioned earlier, this limitation is somewhat mitigated by findings by
Mulheren et al. (2019), who have recently shown tight correspondence between 2D lateral view area measures and 3D volumetric measures of pharyngeal residue.