No access
Research Article
August 2012

Vocal Tract Representation in the Recognition of Cerebral Palsied Speech

Publication: Journal of Speech, Language, and Hearing Research
Volume 55, Number 4
Pages 1190-1207

Abstract

Purpose

In this study, the authors explored articulatory information as a means of improving the recognition of dysarthric speech by machine.

Method

Data were derived chiefly from the TORGO database of dysarthric articulation (Rudzicz, Namasivayam, & Wolff, 2011) in which motions of various points in the vocal tract are measured during speech. In the 1st experiment, the authors provided a baseline model indicating a relatively low performance with traditional automatic speech recognition (ASR) using only acoustic data from dysarthric individuals. In the 2nd experiment, the authors used various measures of entropy (statistical disorder) to determine whether characteristics of dysarthric articulation can reduce uncertainty in features of dysarthric acoustics. These findings led to the 3rd experiment, in which recorded dysarthric articulation was directly encoded into the speech recognition process.

Results

The authors found that 18.3% of the statistical disorder in the acoustics of speakers with dysarthria can be removed if articulatory parameters are known. Using articulatory models reduces phoneme recognition errors relatively by up to 6% for speakers with dysarthria in speaker-dependent systems.

Conclusions

Articulatory knowledge is useful in reducing rates of error in ASR for speakers with dysarthria and in reducing statistical uncertainty of their acoustic signals. These findings may help to guide clinical decisions related to the use of ASR in the future.

Get full access to this article

View all available purchase options and get full access to this article.

References

Abbs, J. H., Folkins, J. W., & Sivarajan, M. (1976). Motor impairment following blockade of the infraorbital nerve: Implications for the use of anesthetization techniques in speech research. Journal of Speech and Hearing Research, 19, 19–35.
American Speech-Language-Hearing Association. (2011). Speech-language pathology medical review guidelines. Retrieved from www.asha.org/practice/reimbursement/SLP-medical-review-guidelines
Augmentative Communication, Inc. (2007). Section 3: Clinical aspects of AAC devices. Retrieved from www.augcominc.com/whatsnew/ncs3.html
Barlow, H. B. (1989). Unsupervised learning. Neural Computation, 1, 295–311.
Bennett, J. W., van Lieshout, P., & Steele, C. M. (2007). Tongue control for speech and swallowing in healthy younger and older adults. International Association of Orofacial Myology, 33, 5–18.
Chesta, C., Siohan, O., & Lee, C.-H. (1999). Maximum a posteriori linear regression for hidden Markov model adaptation. EUROSPEECH-99: Proceedings, Sixth European Conference on Speech Communication and Technology, 1, 211–214Retrieved from www.isca speech.org/archive/eurospeech_1999/e99_0211.html
Duffy, J. R. (1995). Motor speech disorders: Substrates, differential diagnosis, and management. St. Louis, MO: Mosby.
Enderby, P. M. (1983). Frenchay dysarthria assessment. San Diego, CA: College-Hill.
Freund, H.-J., Jeannerod, M., Hallett, M., & Leiguarda, R. (2005). Higher-order motor disorders: From neuroanatomy and neurobiology to clinical neurology. New York, NY: Oxford University Press.
Fukuda, T., & Nitta, T. (2003). Noise-robust automatic speech recognition using orthogonalized distinctive phonetic feature vectors. EUROSPEECH 2003–INTERSPEECH 2003: Eighth European Conference on Speech Communication and Technology, 3, 2189–2192Retrieved from www.isca-speech.org/archive/eurospeech_2003/e03_2189.html
Goldberger, J., Gordon, S., & Greenspan, H. (2003). An efficient image similarity measure based on approximations of KL-divergence between two Gaussian mixtures. Proceedings of Ninth IEEE International Conference on Computer Vision, 1, 487–493. doi:10.1109/ICCV.2003.1238387
Goto, Y., Hochberg, M. M., Mashao, D. J., & Silverman, H. F. (1995). Incremental MAP estimation of HMMs for efficient training and improved performance. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 1995), 1, 457–460. doi:10.1109/ICASSP.1995.479627
Gracco, V. L. (1995). Central and peripheral components in the control of speech movements. In Bell-Berti, F., & Raphael, L. J. (Eds.), Producing speech: Contemporary issues. For Katherine Safford Harris (pp. 417–431). Woodbury, NY: American Institute of Physics Press.
Guenther, F. H., & Perkell, J. S. (2004). A neural model of speech production and its application to studies of the role of auditory feedback in speech. In Maassen, B., Kent, R., Peters, H., van Lieshout, P., & Hulstijn, W. (Eds.), Speech motor control in normal and disordered speech (pp. 29–49). New York, NY: Oxford University Press.
Hosom, J.-P., Kain, A. B., Mishra, T., van Santen, J. P. H., Fried-Oken, M., & Staehely, J. (2003). Intelligibility of modifications to dysarthric speech. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), 1, 924–927. doi:10.1109/ICASSP.2003.1198933
Huang, X., Acero, A., & Hon, H.-W. (2001). Spoken language processing: A guide to theory, algorithm and system development. Upper Saddle River, NJ: Prentice Hall.
Huber, M. F., Bailey, T., Durrant-Whyte, H., & Hanebeck, U. D. (2008). On entropy approximation for Gaussian mixture random vectors. Proceedings of the 2008 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, 1, 181–188. doi:10.1109/MFI.2008.4648062
Hustad, K. C., Gorton, K., & Lee, J. (2010). Classification of speech and language profiles in 4-year-old children with cerebral palsy: A prospective preliminary study. Journal of Speech, Language, and Hearing Research, 53, 1496–1513.
Kent, R. D., & Rosen, K. (2004). Motor control perspectives on motor speech disorders. In Maassen, B., Kent, R., Peters, H., van Lieshout, P., & Hulstijn, W. (Eds.), Speech motor control in normal and disordered speech (pp. 285–311). New York, NY: Oxford University Press.
Kent, R. D., Weismer, G., Kent, J. F., & Rosenbek, J. C. (1989). Toward phonetic intelligibility testing in dysarthria. Journal of Speech and Hearing Disorders, 54, 482–499.
Lamere, P., Kwok, P., Gouvea, E., Raj, B., Singh, R., Walker, W., & Wolf, P. (2003). The CMU SPHINX-4 Speech Recognition System. Unpublished manuscript. Retrieved from www.cs.cmu.edu/∼rsingh/homepage/papers/icassp03-sphinx4_2.pdf
Lazo, A. C., & Rathie, P. N. (1978). On the entropy of continuous probability distributions. IEEE Transactions on Information Theory, 23, 120–122.
Markov, K., Dang, J., & Nakamura, S. (2006). Integration of articulatory and spectrum features based on the hybrid HMM/BN modeling framework. Speech Communication, 48, 161–175.
Mefferd, A. S., & Green, J. R. (2010). Articulatory-to-acoustic relations in response to speaking rate and loudness manipulations. Journal of Speech, Language, and Hearing Research, 53, 1206–1219.
Menéndez-Pidal, X., Polikoff, J. B., Peters, S. M., Leonzjo, J. E., & Bunnell, H. T. (1996). The Nemours database of dysarthric speech. Proceedings of the Fourth International Conference on Spoken Language Processing, 3, 1962–1965. doi:10.1109/ICSLP.1996.608020
Miyazawa, Y. (1993). An all-phoneme ergodic HMM for unsupervised speaker adaptation. Proceedings of the 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2, 574–577. doi:10.1109/ICASSP.1993.319372
Murphy, K. P. (2002). Dynamic Bayesian networks: Representation, inference and learning (Unpublished doctoral dissertation).. University of California at Berkeley
Nam, H., & Goldstein, L. (2006). TADA (TAsk Dynamics Application) manual. New Haven, CT: Haskins Laboratories.
Nam, H., & Saltzman, E. (2003). A competitive, coupled oscillator model of syllable structure. Proceedings of the 15th International Congress of Phonetic Sciences, 1, 2253–2256.
O’Shaughnessy, D. (2000). Speech communications: Human and machine. New York, NY: IEEE Press.
Raghavendra, P., Rosengren, E., & Hunnicutt, S. (2001). An investigation of different degrees of dysarthric speech as input to speaker-adaptive and speaker-dependent recognition systems. Augmentative and Alternative Communication, 17, 265–275.
Ramsay, J. O., & Silverman, B. W. (2005). Principal differential analysis. In Ramsay, J. O., & Silverman, B. W. (Eds.), Functional data analysis (pp. 327–348). New York, NY: Springer.
Richmond, K., King, S., & Taylor, P. (2003). Modelling the uncertainty in recovering articulation from acoustics. Computer Speech and Language, 17, 153–172.
Rosen, K., & Yampolsky, S. (2000). Automatic speech recognition and a review of its functioning with dysarthric speech. Augmentative and Alternative Communication, 16, 48–60. doi:10.1080/07434610012331278904
Rudzicz, F. (2007). Comparing speaker-dependent and speaker-adaptive acoustic models for recognizing dysarthric speech. Proceedings of the Ninth International ACM SIGACCESS Conference on Computers and Accessibility (Assets07), 1, 255–266. doi:10.1145/1296843.1296899
Rudzicz, F. (2009). Applying discretized articulatory knowledge to dysarthric speech. Proceedings of the 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP09), 1, 4501–4504. doi:10.1109/ICASSP.2009.4960630
Rudzicz, F. (2010a). Adaptive kernel canonical correlation analysis for estimation of task dynamics from acoustics. Proceedings of the 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP10), 1, 4198–4201Retrieved from www.cs.toronto.edu/∼frank/Download/Papers/rudzicz_icassp10.pdf
Rudzicz, F. (2010b). Correcting errors in speech recognition with articulatory dynamics. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), 1, 60–68Retrieved from www.aclweb.org/anthology-new/P/P10/P10-1007.pdf?CFID=99183903&CFTOKEN=86059735
Rudzicz, F. (2010c). Towards a noisy-channel model of dysarthria in speech recognition. Proceedings of the First Workshop on Speech and Language Processing for Assistive Technologies (SLPAT) at the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2010), 1, 80–88Retrieved from www.cs.toronto.edu/∼frank/Download/Papers/rudzicz_naacl10.pdf
Rudzicz, F. (2011a). Articulatory knowledge in the recognition of dysarthric speech. IEEE Transactions on Audio, Speech, and Language Processing, 19, 947–960.
Rudzicz, F. (2011b). Production knowledge in the recognition of dysarthric speech (Unpublished doctoral dissertation). University of Toronto, Ontario, Canada
Rudzicz, F., Namasivayam, A. K., & Wolff, T. (2011). The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Language Resources and Evaluation. Advance online publication. doi:10.1007/s10579-011-9145-0
Saltzman, E. (1986). Task dynamic coordination of the speech articulators: A preliminary model. In Heuer, H., & Fromm, C. (Eds.), Generation and modulation of action patterns (pp. 129–144). Berlin, Germany: Springer-Verlag.
Saltzman, E. L., & Munhall, K. G. (1989). A dynamical approach to gestural patterning in speech production. Ecological Psychology, 1, 333–382. doi:10.1207/s15326969eco0104_2
Seikel, J. A., King, D. W., & Drumright, D. G. (Eds.), 2005). Anatomy & physiology for speech, language, and hearing (3rd ed.). Clifton Park, NJ: Delmar.
Shannon, C. E. (1949). A mathematical theory of communication. Urbana, IL: University of Illinois Press.
Sharma, H. V., & Hasegawa-Johnson, M. (2010). State-transition interpolation and map adaptation for HMM-based dysarthric speech recognition. Proceedings of the NAACL HLT 2010 Workshop on Speech and Language Processing for Assistive Technologies (SLPAT), 1, 72–79Retrieved from www.isle.illinois.edu/pubs/2010/sharma10slpat.pdf
Shevell, M., Miller, S. P., Scherer, S. W., Yager, J. Y., & Fehlings, M. G. (2011). The Cerebral Palsy Demonstration Project: A multidimensional research approach to cerebral palsy. Seminars in Pediatric Neurology, 18, 31–39. doi:10.1016/j.spen.2011.02.004
Smith, A., & Goffman, L. (2004). Interaction of motor and language factors in the development of speech production. In Maassen, B., Kent, R., Peters, H., van Lieshout, P., & Hulstijn, W. (Eds.), Speech motor control in normal and disordered speech (pp. 227–252). New York, NY: Oxford University Press.
Stevens, K. N., & Keyser, S. J. (2010). Quantal theory, enhancement and overlap. Journal of Phonetics, 38, 10–19. doi:10.1016/j.wocn.2008.10.004
Toda, T., Black, A. W., & Tokuda, K. (2008). Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model. Speech Communication, 50, 215–227. doi:http://dx.doi.org/10.1016/j.specom.2007.09.001
van Lieshout, P. (2004). Dynamical systems theory and its application in speech. In Maassen, B., Kent, R., Peters, H., van Lieshout, P., & Hulstijn, W. (Eds.), Speech motor control in normal and disordered speech (pp. 51–82). New York, NY: Oxford University Press.
van Lieshout, P., Merrick, G., & Goldstein, L. (2008). An articulatory phonology perspective on rhotic articulation problems: A descriptive case study. Asia Pacific Journal of Speech, Language, and Hearing, 11, 283–303.
van Lieshout, P. H. H. M., & Moussa, W. (2000). The assessment of speech motor behavior using electromagnetic articulography. The Phonetician, 81, 9–22.
Vertanen, K. (2006). Baseline WSJ acoustic models for HTK and Sphinx: Training recipes and recognition experiments (Technical report). Cambridge, United Kingdom: Cavendish Laboratory.
Woodland, P. C. (2001). Speaker adaptation for continuous density HMMs: A review. In ISCA Tutorial and Research Workshop on Adaptation Methods for Speech Recognition (Adaptation-2001). 11–19Retrieved from www.isca-speech.org/archive_open/adaptation/adap_011.html
Wrench, A. (1999). MOCHA-TIMIT [Articulatory database]. Edingburgh, United Kingdom: Centre for Speech Technology Research, University of EdinburghRetrieved from www.cstr.ed.ac.uk/research/projects/artic/mocha.html
Wrench, A., & Richmond, K. (2000). Continuous speech recognition using articulatory data. Proceedings of the Sixth International Conference on Spoken Language Processing (ICSLP 2000), 4, 145–148Retrieved from www.isca-speech.org/archive/icslp_2000/i00_4145.html
Yorkston, K. M., & Beukelman, D. R. (1981). Assessment of intelligibility of dysarthric speech. Tigard, OR: C.C. Publications.
Ziegler, W., & Maassen, B. (2004). The role of the syllable in disorders of spoken language production. In Maassen, B., Kent, R., Peters, H., van Lieshout, P., & Hulstijn, W. (Eds.), Speech motor control in normal and disordered speech (pp. 415–447). New York, NY: Oxford University Press.
Zierdt, A., Hoole, P., & Tillmann, H. G. (1999). Development of a system for three-dimensional fleshpoint measurement of speech movements. Proceedings of the 14th International Conference of Phonetic Sciences (ICPhS99), 1, 73–75Retrieved from www.phonetik.uni-muenchen.de/∼hoole/pdf/3d_icphs99.pdf
Zue, V., Seneff, S., & Glass, J. (1989). Speech database development: TIMIT and beyond. In ESCA Tutorial and Research Workshop on Speech Input/Output Assessment and Speech Databases (SIOA-1989), 2, 35–40Retrieved from www.isca-speech.org/archive_open/sioa_89/sia_2035.html

Information & Authors

Information

Published In

Journal of Speech, Language, and Hearing Research
Volume 55Number 4August 2012
Pages: 1190-1207

History

  • Received: Aug 13, 2011
  • Revised: Nov 3, 2011
  • Accepted: Dec 6, 2011
  • Published in issue: Aug 1, 2012

Permissions

Request permissions for this article.

Key Words

  1. dysarthria
  2. articulation
  3. speech recognition

Authors

Affiliations

Frank Rudzicz [email protected]
University of Toronto, Ontario, Canada
Graeme Hirst
University of Toronto, Ontario, Canada
Pascal van Lieshout
University of Toronto, Ontario, Canada
Institute of Biomaterials and Biomedical Engineering, Toronto
Toronto Rehabilitation Institute

Notes

Correspondence to Frank Rudzicz, who is affiliated with both the University of Toronto and Toronto Rehabilitation Institute:[email protected]
Editor: Anne Smith
Associate Editor: Wolfram Ziegler

Metrics & Citations

Metrics

Article Metrics
View all metrics



Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

For more information or tips please see 'Downloading to a citation manager' in the Help menu.

Citing Literature

  • UTran-DSR: a novel transformer-based model using feature enhancement for dysarthric speech recognition, EURASIP Journal on Audio, Speech, and Music Processing, 10.1186/s13636-024-00368-0, 2024, 1, (2024).
  • Using Novel Hybrid Convolutional Neural Network for Dysarthria Diagnosis, 2023 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), 10.1109/CSDE59766.2023.10487652, (01-06), (2023).
  • Speech Feature Enhancement based on Time-frequency Analysis, ACM Transactions on Asian and Low-Resource Language Information Processing, 10.1145/3605549, 22, 8, (1-14), (2023).
  • Multi-Modal Acoustic-Articulatory Feature Fusion For Dysarthric Speech Recognition, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 10.1109/ICASSP43922.2022.9746855, (7372-7376), (2022).
  • Consonantal Landmarks as Predictors of Dysarthria among English-Speaking Adults with Cerebral Palsy, Brain Sciences, 10.3390/brainsci11121550, 11, 12, (1550), (2021).
  • Analysing spectral changes over time to identify articulatory impairments in dysarthria, The Journal of the Acoustical Society of America, 10.1121/10.0003332, 149, 2, (758-769), (2021).
  • Statistical Distribution Exploration of Tongue Movement for Pathological Articulation on Word/Sentence Level, IEEE Access, 10.1109/ACCESS.2020.2993856, 8, (91057-91069), (2020).
  • Automatic speech recognition: A primer for speech-language pathology researchers, International Journal of Speech-Language Pathology, 10.1080/17549507.2018.1510033, 20, 6, (599-609), (2019).
  • An Optimal Set of Flesh Points on Tongue and Lips for Speech-Movement Classification, Journal of Speech, Language, and Hearing Research, 10.1044/2015_JSLHR-S-14-0112, 59, 1, (15-26), (2018).
  • Linguistic Features Identify Alzheimer’s Disease in Narrative Speech, Journal of Alzheimer's Disease, 10.3233/JAD-150520, 49, 2, (407-422), (2015).

View Options

Sign In Options

ASHA member? If so, log in with your ASHA website credentials for full access.

Member Login

View options

PDF

View PDF

Full Text

View Full Text

Figures

Tables

Media

Share

Share

Copy the content Link

Share