No access
Research Article
February 2016

An Optimal Set of Flesh Points on Tongue and Lips for Speech-Movement Classification

Publication: Journal of Speech, Language, and Hearing Research
Volume 59, Number 1
Pages 15-26

Abstract

Purpose

The authors sought to determine an optimal set of flesh points on the tongue and lips for classifying speech movements.

Method

The authors used electromagnetic articulographs (Carstens AG500 and NDI Wave) to record tongue and lip movements from 13 healthy talkers who articulated 8 vowels, 11 consonants, a phonetically balanced set of words, and a set of short phrases during the recording. We used a machine-learning classifier (support-vector machine) to classify the speech stimuli on the basis of articulatory movements. We then compared classification accuracies of the flesh-point combinations to determine an optimal set of sensors.

Results

When data from the 4 sensors (T1: the vicinity between the tongue tip and tongue blade; T4: the tongue-body back; UL: the upper lip; and LL: the lower lip) were combined, phoneme and word classifications were most accurate and were comparable with the full set (including T2: the tongue-body front; and T3: the tongue-body front).

Conclusion

We identified a 4-sensor set—that is, T1, T4, UL, LL—that yielded a classification accuracy (91%–95%) equivalent to that using all 6 sensors. These findings provide an empirical basis for selecting sensors and their locations for scientific and emerging clinical applications that incorporate articulatory movements.

Get full access to this article

View all available purchase options and get full access to this article.

References

Badin, P., Bailly, G., Raybaudi, M., & Segebarth, C. (1998). A three-dimensional linear articulatory model based on MRI data. In Proceedings of the Third ESCA/COCOSDA Workshop on Speech Synthesis (pp. 249–254). Jenolan Caves, Australia: ECSA.
Badin, P., Baricchi, E., & Vilain, A. (1997). Determining tongue articulation: From discrete fleshpoints to continuous shadow. In Kokkinakis, G., Fakotakis, N., & Dermatas, E. (Eds.), Proceedings of the Eurospeech '97—5th European Conference on Speech Communication and Technology, Vol. 1 (pp. 47–50). Grenoble, France: European Speech Communication Association.
Beaudoin, R. E., & McGowan, R. S. (2000). Principal components analysis of X-ray microbeam data for articulatory recovery. In Proceedings of the Fifth Seminar on Speech Production (pp. 225–228). Kloster Seeon, Bavaria, Germany: International Seminar on Speech Production.
Beautemps, D., Badin, P., & Bailly, G. (2001). Linear degrees of freedom in speech production: Analysis of cineradio- and labio-film data and articulatory-acoustic modeling. The Journal of the Acoustical Society of America, 109, 2165–2180.
Berry, J. J. (2011). Accuracy of the NDI Wave speech research system, Journal of Speech, Language, and Hearing Research, 54, 1295–1301.
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In Proceeding of the Fifth Annual Workshop on Computational Learning Theory (pp. 144–152). New York, NY: Association for Computing Machinery.
Bunton, K., & Story, B. H. (2012). The relation of nasality and nasalance to nasal port area based on a computational model. The Cleft Palate–Craniofacial Journal, 49, 741–749.
Burke, K. S., Shutts, R. E., & King, W. P. (1965). Range of difficulty of four Harvard phonetically balanced word lists. The Laryngoscope, 75, 289–296.
Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273–297.
Denby, B., Schultz, T., Honda, K., Hueber, T., Gilbert, J. M., & Brumberg, J. S. (2010). Silent speech interfaces. Speech Communication, 52, 270–287.
Engwall, O. (2000). A 3D tongue model based on MRI data. In Proceedings of the Sixth International Conference on Spoken Language Processing, Vol. 3 (pp. 901–904). Beijing, China: International Speech Communication Association.
Fagan, M. J., Ell, S. R., Gilbert, J. M., Sarrazin, E., & Chapman, P. M. (2008). Development of a (silent) speech recognition system for patients following laryngectomy. Medical Engineering & Physics, 30, 419–425.
Green, J. R. (2015). Mouth matters: Scientific and clinical applications of speech movement analysis. Perspectives on Speech Science and Orofacial Disorders, 25, 6–16.
Green, J. R., & Wang, Y.-T. (2003). Tongue-surface movement patterns during speech and swallowing. The Journal of the Acoustical Society of America, 113, 2820–2833.
Green, J. R., Wang, J., & Wilson, D. L. (2013). SMASH: A tool for articulatory data processing and analysis. In Bimbot, F., Cerisara, C., Fougeron, C., Gravier, G., Lamel, L., Pellegrino, F., & Perrier, P. (Eds.), Interspeech 2013—14th Annual Conference of the International Speech Communication Association (pp. 1331–1335). Lyon, France: International Speech Communication Association.
Green, J. R., Wilson, E. M., Wang, Y.-T., & Moore, C. A. (2007). Estimating mandibular motion based on chin surface targets during speech. Journal of Speech, Language, and Hearing Research, 50, 928–939.
Guenther, F. H., Espy-Wilson, C. Y., Boyce, S. E., Matthies, M. L., Zandipour, M., & Perkell, J. (1999). Articulatory tradeoffs reduce acoustic variability during American English /r/ production. The Journal of Acoustical Society of America, 105, 2854–2865.
Hahm, S., Heitzman, D., & Wang, J. (2015). Recognizing dysarthric speech due to amyotrophic lateral sclerosis with across-speaker articulatory normalization. In 6th Workshop on Speech and Language Processing for Assistive Technologies (pp. 47–54). Dresden, Germany: Association for Computational Linguistics.
Hahm, S., & Wang, J. (2015). Silent speech recognition from articulatory movements using deep neural network. In Scottish Consortium for ICPhS 2015 (Eds.), Proceedings of the 18th International Congress of Phonetic Sciences (no. 524, pp. 1–5) . Glasgow, Scotland: University of Glasgow.
Harshman, R., Ladefoged, P., & Goldstein, L. (1977). Factor analysis of tongue shapes. The Journal of the Acoustical Society of America, 62, 693–707.
Hasegawa-Johnson, M. (1998). Electromagnetic exposure safety of the Carstens Articulograph AG100. The Journal of the Acoustical Society of America, 104, 2529–2532.
Hofe, R., Ell, S. R., Fagan, M. J., Gilbert, J. M., Green, P. D., Moore, R. K., & Rybchenko, S. I. (2013). Small-vocabulary speech recognition using a silent speech interface based on magnetic sensing. Speech Communication, 55, 22–32.
Hoole, P. (1999). On the lingual organization of the German vowel system. The Journal of the Acoustical Society of America, 106, 1020–1032.
Hoole, P., & Zierdt, A. (2010). Five-dimensional articulography. In Maassen, B. & van Lieshout, P. (Eds.), Speech motor control: New developments in basic and applied research (pp. 331–349). Oxford, United Kingdom: Oxford University Press.
Ji, A., Berry, J., & Johnson, M. T. (2014). The Electromagnetic Articulography Mandarin Accented English (EMA-MAE) corpus of acoustic and 3D articulatory kinematic data. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 7719–7723). Florence, Italy: Institute of Electrical and Electronics Engineers.
Katz, W., Campbell, T., Wang, J., Farrar, E., Eubanks, J. C., Balasubramanian, A., … Rennaker, R. (2014). Opti-Speech: A real-time, 3D visual feedback system for speech training. In Li, H., Meng, H., Ma, B., Chng, E. S., & Xie, L. (Eds.), Interspeech 2014—15th Annual Conference of the International Speech Communication Association (pp. 1174–1178). Singapore: International Speech Communication Association.
Katz, W. F., & McNeil, M. R. (2010). Studies of articulatory feedback treatment for apraxia of speech based on electromagnetic articulography. Perspectives on Neurophysiology and Neurogenic Speech and Language Disorders, 20(3), 73–79.
Kent, R. D., Adams, S. G., & Turner, G. S. (1996). Models of speech production. In Lass, N. J. (Ed.), Principles of experimental phonetics (pp. 3–45). St. Louis, MO: Mosby-Yearbook.
King, S., Frankel, J., Livescu, K., McDermott, E., Richmond, K., & Wester, M. (2007). Speech production knowledge in automatic speech recognition. The Journal of the Acoustical Society of America, 121, 723–742.
Ladefoged, P., & Johnson, K. (2011). A course in phonetics (6th ed.). Boston, MA: Wadsworth.
Levitt, J. S., & Katz, W. F. (2010). The effects of EMA-based augmented visual feedback on the English speakers' acquisition of the Japanese flap: A perceptual study. In Kobayashi, T., Hirose, K., & Nakamura, S. (Eds.), Interspeech 2010—11th Annual Conference of the International Speech Communication Association (pp. 1862–1865). Makuhari, Japan: International Speech Communication Association.
Lindblom, B., & Sussman, H. M. (2012). Dissecting coarticulation: How locus equations happen. Journal of Phonetics, 40, 1–19.
Maeda, S. (1978). Une analyse statistique sur les positions de la langue: Étude préliminaire sur les voyelles françaises [A statistical analysis of tongue positions: Preliminary study of French vowels]. In Actes des 9èmes Journées d'Études sur la Parole (pp. 191–199). Lannion, France: Groupement d'Acousticiens de la Langue Française.
Maeda, S. (1990). Compensatory articulation during speech: Evidence from the analysis and synthesis of vocal-tract shapes using an articulatory model. In Hardcastle, W. J. & Marchal, A. (Eds.), Speech production and speech modelling (pp. 131–149). Dordrecht, the Netherlands: Kluwer Academic.
Ouni, S. (2014). Tongue control and its implication in pronunciation training. Computer Assisted Language Learning, 27, 439–453.
Perkell, J. S., Cohen, M. H., Svirsky, M. A., Matthies, M. L., Garabieta, I., & Jackson, M. T. T. (1992). Electromagnetic midsagittal articulometer systems for transducing speech articulatory movements. The Journal of the Acoustical Society of America, 92, 3078–3096.
Perkell, J. S., Guenther, F. H., Lane, H., Matthies, M. L., Stockmann, E., Tiede, M., & Zandipour, M. (2004). The distinctness of speakers' productions of vowel contrasts is related to their discrimination of the contrasts. The Journal of Acoustical Society of America, 116, 2338–2344.
Qin, C., Carreira-Perpiñán, M. Á., Richmond, K., Wrench, A., & Renals, S. (2008). Predicting tongue shapes from a few landmark locations. In Interspeech 2008—9th Annual Conference of the International Speech Communication Association (pp. 2306–2309). Brisbane, Australia: International Speech Communication Association.
Rong, P., Yunusova, Y., Wang, J., & Green, J. R. (2015). Predicting early bulbar decline in amyotrophic lateral sclerosis: A speech subsystem approach. Behavioral Neurology, 2015, 183027.
Rudzicz, F., Hirst, G., & van Lieshout, P. (2012) Vocal tract representation in the recognition of cerebral palsied speech. Journal of Speech, Language, and Hearing Research, 55, 1190–1207.
Shutts, R. E., Burke, K. S., & Creston, J. E. (1964). Derivation of twenty-five-word PB lists. Journal of Speech and Hearing Disorders, 29, 442–447.
Slud, E., Stone, M., Smith, P. J., & Goldstein, M., Jr. (2002). Principal components representation of the two-dimensional coronal tongue surface. Phonetica, 59, 108–133.
Story, B. H. (2011). TubeTalker: An airway modulation model of human sound production. In Fels, S. & d'Alessandro, N. (Eds.), Proceedings of the First International Workshop on Performative Speech and Singing Synthesis (pp. 1–8). Vancouver, Canada: P3S 2011.
Suemitsu, A., Dang, J., Ito, T., & Tiede, M. (2015). A real-time articulatory visual feedback approach with target presentation for second language pronunciation learning. The Journal of Acoustical Society of America, 138(4), EL382–EL387.
Wang, J. (2011). Silent speech recognition from articulatory motion (Unpublished doctoral dissertation) . University of Nebraska–Lincoln, Lincoln, NE.
Wang, J., Balasubramanian, A., Mojica de La Vega, L., Green, J. R., Samal, A., & Prabhakaran, B. (2013). Word recognition from continuous articulatory movement time-series data using symbolic representations. In 4th Workshop on Speech and Language Processing for Assistive Technologies (pp. 119–127). Grenoble, France: Association for Computational Linguistics.
Wang, J., Green, J. R., & Samal, A. (2013). Individual articulator's contribution to phoneme production. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. 7785–7789). Vancouver, Canada: Institute of Electrical and Electronics Engineers.
Wang, J., Green, J. R., Samal, A., & Marx, D. B. (2011). Quantifying articulatory distinctiveness of vowels. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 277–280). Florence, Italy: INTERSPEECH.
Wang, J., Green, J. R., Samal, A., & Yunusova, Y. (2013). Articulatory distinctiveness of vowels and consonants: A data-driven approach. Journal of Speech, Language, and Hearing Research, 56, 1539–1551.
Wang, J., & Hahm, S. (2015). Speaker-independent silent speech recognition with across-speaker articulatory normalization and speaker adaptive training. In Möller, S., Ney, H., Möbius, B., Nöth, E., & Steidl, S. (Eds.), Interspeech 2015—16th Annual Conference of the International Speech Communication Association (pp. 2415–2419). Dresden, Germany: International Speech Communication Association.
Wang, J., Hahm, S., & Mau, T. (2015). Determining an optimal set of flesh points on tongue, lips, and jaw for continuous silent speech recognition. In 6th Workshop on Speech and Language Processing for Assistive Technologies (pp. 79–85). Dresden, Germany: Association for Computational Linguistics.
Wang, J., Katz, W. F., & Campbell, T. F. (2014). Contribution of tongue lateral to consonant production. In Li, H., Meng, H., Ma, B., Chng, E. S., & Xie, L. (Eds.), Interspeech 2014—15th Annual Conference of the International Speech Communication Association (pp. 174–178). Singapore: International Speech Communication Association.
Wang, J., Samal, A., & Green, J. R. (2014). Preliminary test of a real-time, interactive silent speech interface based on electromagnetic articulograph. In 4th Workshop on Speech and Language Processing for Assistive Technologies (pp. 38–45). Baltimore, MD: Association for Computational Linguistics.
Wang, J., Samal, A., Green, J. R., & Carrell, T. D. (2009). Vowel recognition from articulatory position time-series data. In Wysocki, B. J. & Wysocki, T. A. (Eds.), Proceedings of the 3rd International Conference on Signal Processing and Communication Systems (p. 1–6). Omaha, NE: Institute of Electrical and Electronics Engineers.
Wang, J., Samal, A., Green, J. R., & Rudzicz, F. (2012a). Sentence recognition from articulatory movements for silent speech interfaces. In 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. 4985–4988). Kyoto, Japan: Institute of Electrical and Electronics Engineers.
Wang, J., Samal, A., Green, J. R., & Rudzicz, F. (2012b). Whole-word recognition from articulatory movements for silent speech interfaces. In Interspeech 2012—13th Annual Conference of the International Speech Communication Association (pp. 1327–1330). Portland, OR: International Speech Communication Association.
Westbury, J. R. (1994). X-ray microbeam speech production database user's handbook. Madison, WI: University of Wisconsin.
Wrench, A. A. (2000). A multi-channel/multi-speaker articulatory database for continuous speech recognition research. Phonus, 5, 1–13.
Yehia, H., & Tiede, M. (1997). A parametric three-dimensional model of the vocal-tract based on MRI data. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 3 (pp. 1619–1625). Munich, Germany: IEEE Computer Society Press.
Yunusova, Y., Green, J. R., & Mefferd, A. (2009). Accuracy assessment for AG500, electromagnetic articulograph. Journal of Speech, Language, and Hearing Research, 52, 547–555.
Yunusova, Y., Green, J. R., Wang, J., Pattee, G., & Zinman, L. (2011). A protocol for comprehensive assessment of bulbar dysfunction in amyotrophic lateral sclerosis (ALS). Journal of Visualized Experiments, 48, 2422.
Yunusova, Y., Weismer, G. G., & Lindstrom, M. J. (2011). Classification of vocalic segments from articulatory kinematics: Healthy controls and speakers with dysarthria. Journal of Speech, Language, and Hearing Research, 54, 1302–1311.
Yunusova, Y., Weismer, G., Westbury, J. R., & Lindstrom, M. J. (2008). Articulatory movements during vowels in speakers with dysarthria and healthy controls. Journal of Speech, Language, and Hearing Research, 51, 596–611.
Zerling, J. P. (1979). Articulation et coarticulation dans les groups occlusive-voyelle en français [Articulation and coarticulation in occlusive-vowel groups in French] (Unpublished doctoral dissertation) . Université de Nancy, Nancy, France.

Information & Authors

Information

Published In

Journal of Speech, Language, and Hearing Research
Volume 59Number 1February 2016
Pages: 15-26
PubMed: 26564030

History

  • Received: Apr 24, 2014
  • Revised: Nov 10, 2014
  • Accepted: Aug 7, 2015
  • Published in issue: Feb 1, 2016

Permissions

Request permissions for this article.

Authors

Affiliations

Jun Wang
Speech Disorders & Technology Lab, The University of Texas at Dallas
Callier Center for Communication Disorders, The University of Texas at Dallas
University of Texas Southwestern Medical Center, Dallas
Ashok Samal
University of Nebraska–Lincoln
Panying Rong
MGH Institute of Health Professions, Boston, MA
Jordan R. Green
MGH Institute of Health Professions, Boston, MA

Notes

Disclosure: The authors have declared that no competing interests existed at the time of publication.
Correspondence to Jun Wang: [email protected]
Editor: Jody Kreiman
Associate Editor: Kate Bunton

Metrics & Citations

Metrics

Article Metrics
View all metrics



Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

For more information or tips please see 'Downloading to a citation manager' in the Help menu.

Citing Literature

  • Toward Generalizable Machine Learning Models in Speech, Language, and Hearing Sciences: Estimating Sample Size and Reducing Overfitting, Journal of Speech, Language, and Hearing Research, 10.1044/2023_JSLHR-23-00273, 67, 3, (753-781), (2024).
  • IR-UWB Radar-Based Contactless Silent Speech Recognition of Vowels, Consonants, Words, and Phrases, IEEE Access, 10.1109/ACCESS.2023.3344177, 11, (144844-144859), (2023).
  • Machine learning in the evaluation of voice and swallowing in the head and neck cancer patient, Current Opinion in Otolaryngology & Head & Neck Surgery, 10.1097/MOO.0000000000000948, 32, 2, (105-112), (2023).
  • The Impact of Stimulus Length in Tongue and Lip Movement Pattern Stability in Amyotrophic Lateral Sclerosis, Journal of Speech, Language, and Hearing Research, 10.1044/2023_JSLHR-23-00079, 67, 10S, (4002-4014), (2023).
  • MagTrack: A Wearable Tongue Motion Tracking System for Silent Speech Interfaces, Journal of Speech, Language, and Hearing Research, 10.1044/2023_JSLHR-22-00319, 66, 8S, (3206-3221), (2023).
  • Tongue and Lip Acceleration as a Measure of Speech Decline in Amyotrophic Lateral Sclerosis, Folia Phoniatrica et Logopaedica, 10.1159/000525514, 75, 1, (23-34), (2022).
  • Silent speech command word recognition using stepped frequency continuous wave radar, Scientific Reports, 10.1038/s41598-022-07842-9, 12, 1, (2022).
  • Opti-Speech-VMT: Implementation and Evaluation, Body Area Networks. Smart IoT and Big Data for Intelligent Health Management, 10.1007/978-3-030-95593-9_19, (233-246), (2022).
  • Evaluation of a Wireless Tongue Tracking System on the Identification of Phoneme Landmarks, IEEE Transactions on Biomedical Engineering, 10.1109/TBME.2020.3023284, 68, 4, (1190-1197), (2021).
  • Inertial Measurements for Tongue Motion Tracking Based on Magnetic Localization With Orientation Compensation, IEEE Sensors Journal, 10.1109/JSEN.2020.3046469, 21, 6, (7964-7971), (2021).

View Options

Sign In Options

ASHA member? If so, log in with your ASHA website credentials for full access.

Member Login

View options

PDF

View PDF

Full Text

View Full Text

Figures

Tables

Media

Share

Share

Copy the content Link

Share