Discriminating Dysarthria Type From Envelope Modulation Spectra


    Previous research demonstrated the ability of temporally based rhythm metrics to distinguish among dysarthrias with different prosodic deficit profiles (J. M. Liss et al., 2009). The authors examined whether comparable results could be obtained by an automated analysis of speech envelope modulation spectra (EMS), which quantifies the rhythmicity of speech within specified frequency bands.


    EMS was conducted on sentences produced by 43 speakers with 1 of 4 types of dysarthria and healthy controls. The EMS consisted of the spectra of the slow-rate (up to 10 Hz) amplitude modulations of the full signal and 7 octave bands ranging in center frequency from 125 to 8000 Hz. Six variables were calculated for each band relating to peak frequency and amplitude and relative energy above, below, and in the region of 4 Hz. Discriminant function analyses (DFA) determined which sets of predictor variables best discriminated between and among groups.


    Each of 6 DFAs identified 2–6 of the 48 predictor variables. These variables achieved 84%–100% classification accuracy for group membership.


    Dysarthrias can be characterized by quantifiable temporal patterns in acoustic output. Because EMS analysis is automated and requires no editing or linguistic assumptions, it shows promise as a clinical and research tool.


    • Arai, T., & Greenberg, S. (1997). The temporal properties of spoken Japanese are similar to those of English.Proceedings of Eurospeech, Rhodes, Greece, 2, 1011–1114.
    • Crouzet, O., & Ainsworth, W. A. (2001, September). On the various influences of envelope information on the perception of speech in adverse conditions: An analysis of between-channel envelope correlation. Paper presented at the Workshop on Consistent and Reliable Cues for Sound Analysis, Aalborg, Denmark.
    • Cummins, F., & Port, R. (1998). Rhythmic constraints on stress timing in English.Journal of Phonetics, 26, 145–171.
    • Dellwo, V. (2006). Rhythm and speech rate: A variation coefficient for delta C.In P. Karnowski, & I. Szigeti (Eds.), Language and language processing: Proceedings of the 38th Linguistic Colloquium, Piliscsaba 2003 (pp. 231–241). Frankfurt, Germany: Peter Lang.
    • Drullman, R., Festen, J. M., & Plomp, R. (1994). Effect of temporal envelope smearing on speech reception.Journal of the Acoustical Society of America, 95, 1053–1064.
    • Grabe, E., & Low, E. L. (2002). Durational variability in speech and the rhythm class hypothesis.In N. Warner, & C. Gussenhoven (Eds.), Papers in laboratory phonology 7 (pp. 515–546). Berlin, Germany: Mouton de Gruyter.
    • Greenberg, S., Arai, T., & Grant, K. (2006). The role of temporal dynamics in understanding spoken language.In P. Divenyi, K. Vicsi, & G. Meyer (Eds.), Dynamics of speech production and perception, 374(pp. 171–193). Amsterdam, The Netherlands: IOS Press.
    • Greenwood, D. D. (1961). Critical bandwidth and the frequency coordinates of the basilar membrane.Journal of the Acoustical Society of America, 33(10)1344–1356.
    • Houtgast, T., & Steeneken, J. M. (1985). A review of the mtf concept in room acoustics and its use for estimating speech intelligibility in auditoria.Journal of the Acoustical Society of America, 77(3)1069–1077.
    • Kent, R. D., & Kim, Y. J. (2003). Toward an acoustic typology of motor speech disorders.Clinical Linguistics & Phonetics, 17(6)427–445.
    • Liss, J. M., Spitzer, S., Caviness, J. N., Adler, C., & Edwards, B. (1998). Syllabic strength and lexical boundary decisions in the perception of hypokinetic dysarthric speech.Journal of the Acoustical Society of America, 104(4)2457–2566.
    • Liss, J. M., Spitzer, S. M., Caviness, J. N., Adler, C., & Edwards, B. (2000). Lexical boundary error analysis in hypokinetic and ataxic dysarthria.Journal of the Acoustical Society of America, 107(6)3415–3424.
    • Liss, J. M., White, L., Mattys, S. L., Lansford, K., Lotto, A. J., Spitzer, S., & Caviness, J. N. (2009). Quantifying speech rhythm deficits in the dysarthrias.Journal of Speech, Language, and Hearing Research, 52(5)1334–1352.
    • Low, E. L., Grabe, E., & Nolan, F. (2000). Quantitative characterisations of speech rhythm: “Syllable-timing” in Singapore English.Language and Speech, 43, 377–401.
    • Milenkovic, P. (2004). TF32 [Computer software].Madison, WI: University of Wisconsin—Madison, Department of Electrical and Computer Engineering.
    • Moore, B. C. J., & Glasberg, B. R. (1996). A revision of Zwicker’s loudness model.Acta Acustica, 82, 335–345.
    • Nazzi, T., Bertoncini, J., & Mehler, J. (1998). Language discrimination by newborns: Towards an understanding of the role of rhythm.Journal of Experimental Psychology: Human Perception and Performance, 24, 756–766.
    • Peterson, G., & Lehiste, I. (1960). Duration of syllable nuclei in English.Journal of the Acoustical Society of America, 32, 693–703.
    • Plomp, R. (1983). The role of modulation in hearing.In R. Klinke (Ed.), Hearing: Physiological bases and psychophysics (pp. 270–275). Heidelberg, Germany: Springer-Verlag.
    • Ramus, F., Nespor, M., & Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal.Cognition, 73(3)265–292.
    • Shoji, K., Regenbogen, E., Daw Yu, J., & Blaugrund, S. M. (1991). High-frequency components of normal voice.Journal of Voice, 5(1)29–35.
    • Sound Forge [Software]2004: Middleton, WI: Sony Creative Software.
    • Spitzer, S., Liss, J., & Mattys, S. (2007). Acoustic cues to lexical segmentation: A study of resynthesized speech.Journal of the Acoustical Society of America, 122(6)3678–3687.
    • Tilsen, S., & Johnson, K. (2008). Low-frequency Fourier analysis of speech rhythm.Journal of the Acoustical Society of America, 124(2)EL34–EL39.
    • Valencia, N., Mendoza, L., Mateo, R., & Carballo, G. (1994). High-frequency components of normal and dysphonic voices.Journal of Voice, 8(2)157–162.
    • White, L., & Mattys, S. L. (2007). Calibrating rhythm: First language and second language studies.Journal of Phonetics, 35(4)501–522.

    Additional Resources