No access
June 2005

The Haskins Optically Corrected Ultrasound System (HOCUS)

Publication: Journal of Speech, Language, and Hearing Research
Volume 48, Number 3
Pages 543-553


The tongue is critical in the production of speech, yet its nature has made it difficult to measure. Not only does its ability to attain complex shapes make it difficult to track, it is also largely hidden from view during speech. The present article describes a new combination of optical tracking and ultrasound imaging that allows for a noninvasive, real-time view of most of the tongue surface during running speech. The optical system (Optotrak) tracks the location of external structures in 3-dimensional space using infrared emitting diodes (IREDs). By tracking 3 or more IREDs on the head and a similar number on an ultrasound transceiver, the transduced image of the tongue can be corrected for the motion of both the head and the transceiver and thus be represented relative to the hard structures of the vocal tract. If structural magnetic resonance images of the speaker are available, they may allow the estimation of the location of the rear pharyngeal wall as well. This new technique is contrasted with other currently available options for imaging the tongue. It promises to provide high-quality, relatively low-cost imaging of most of the tongue surface during fairly unconstrained speech.

Get full access to this article

View all available purchase options and get full access to this article.


Abbs, J. H., & Nadler, R. D. (1987). User’s manual for the University of Wisconsin X-ray microbeam. Madison: University of Wisconsin—Madison, Waisman Research Center.
Akgul, Y. S., Kambhamettu, C., & Stone, M. L. (1999). Automatic extraction and tracking of the tongue contours. IEEE Transactions on Medical Imaging, 18, 1035–1045.
Baer, T., Gore, J. C., Boyce, S. E., & Nye, P. W. (1987). Application of MRI to the analysis of speech production. Magnetic Resonance Imaging, 5, 1–7.
Baer, T., Gore, J. C., Gracco, L. C., & Nye, P. W. (1991). Analysis of vocal tract shape and dimensions using magnetic resonance imaging: Vowels. Journal of the Acoustical Society of America, 90, 799–828.
Blake, A., & Isard, M. (1998). Active contours: The application of techniques from graphics, vision, control theory and statistics to visual tracking of shapes in motion. London: Springer.
Carmody, F. (1941). An x-ray study of pharyngeal articulation. University of California Publications in Modern Philology, 21(5), 377–384.
Demolin, D., Metens, T., & Soquet, A. (2000). Real time MRI and articulatory coordinations in vowels. In P. Hoole (Ed.), 5th Seminar on Speech Production: Models and data (pp. 93–96). Munich, Germany: Ludwig-Maximilians-Universitat.
Engwall, O. (2003). Combining MRI, EMA and EPG measurements in a three-dimensional tongue model. Speech Communication, 41, 303–329.
Fitch, W. T., & Reby, D. (2001). The descended larynx is not uniquely human. Proceedings of the Royal Society of London Series B, Biological Sciences, 268, 1669–1675.
Gick, B. (2002). The use of ultrasound for linguistic phonetic fieldwork. Journal of the International Phonetic Association, 32, 113–121.
Gick, B., Iskarous, K., Whalen, D. H., & Goldstein, L. M. (2003). Constraints on variation in the production of English /r/. In S. Palethorpe & M. Tabain (Eds.), Proceedings of the 6th International Seminar on Speech Production (pp. 73–78). Sydney, Australia: Macquarie University.
Honorof, D. N., Chang, C. Y. C., Iskarous, K., Tiede, M. K., Ostry, D., & Whalen, D. H. (2003, September). Mandarin /r/ as a grooved approximant: Reconstructed tongue shape data from MRI & OPTOTRAK-ultrasound synchronization. Paper presented at Societas Linguistica Europea, Lyons, France.
Hoole, P., & Nguyen, N. (1997). Electromagnetic articu-lography in coarticulation research. Forschungsberichte des Instituts für Phonetik und Sprachliche Kommunikation der Universität München, 35, 177–184.
Iskarous, K. (2005). Detecting the edge of the tongue: A tutorial. Clinical Linguistics and Phonetics, 19, 555–565.
Iskarous, K. (in press). Patterns of tongue movement. Journal of Phonetics.
Iskarous, K., Goldstein, L. M., Whalen, D. H., Tiede, M. K., & Rubin, P. E. (2003). CASY: The Haskins Configurable Articulatory speech synthesizer. In D. Recasens, M.-J. Solé, & J. Romero (Eds.), Proceedings of the 15th International Congress of Phonetic Sciences (pp. 185–188). Barcelona, Spain: Universitat Autonoma de Barcelona.
Iskarous, K., Whalen, D. H., & Mattingly, I. G. (2001). Modeling tongue shapes with conic arcs. Journal of the Acoustical Society of America, 110, 2760.
Kaburagi, T., & Honda, M. (1994a). Determination of sagittal tongue shape from the positions of points on the tongue surface. Journal of the Acoustical Society of America, 96, 1356–1366.
Kaburagi, T., & Honda, M. (1994b). An ultrasonic method for monitoring tongue shape and the position of a fixed-point on the tongue surface. Journal of the Acoustical Society of America, 95, 2268–2270.
Kelso, J. A. S., Tuller, B., Vatikiotis-Bateson, E., & Fowler, C. A. (1984). Functionally specific articulatory cooperation following jaw perturbations during speech: Evidence for coordinative structures. Journal of Experimental Psychology: Human Perception and Performance, 10, 812–832.
Kiritani, S. (1986). X-ray microbeam method for the measurement of articulatory dynamics: Techniques and results. Speech Communication, 45, 119–140.
Kiritani, S., Itoh, K., & Fujimura, O. (1975). Tongue-pellet tracking by a computer-controlled x-ray microbeam system. Journal of the Acoustical Society of America, 57, 1516–1520.
Lakshminarayanan, A. V., Lee, S., & McCutcheon, M. J. (1991). MR imaging of the vocal tract during vowel production. Journal of Magnetic Resonance Imaging, 1, 71–76.
Larsson, S. G., Mancuso, A., & Hanafee, W. (1982). Computed-tomography of the tongue and floor of the mouth. Radiology, 143, 493–500.
Lindblom, B. E., Lubker, J., & Gay, T. (1979). Formant frequencies of some fixed-mandible vowels and a model of speech motor programming by predictive simulation. Journal of Phonetics, 7, 147–161.
Löfqvist, A., & Gracco, V. (1994). Tongue body kinematics in velar stop production: Influences of consonant voicing and vowel context. Phonetica, 51, 52–67.
Lundberg, A. J., & Stone, M. L. (1999). Three-dimensional tongue surface reconstruction: Practical considerations for ultrasound data. Journal of the Acoustical Society of America, 106, 2858–2867.
Magen, H. S., Kang, A. M., Tiede, M. K., & Whalen, D. H. (2003). Posterior pharyngeal wall position in the production of speech. Journal of Speech, Language, and Hearing Research, 46, 241–251.
Morrish, K. A., Stone, M. L., Sonies, B. C., Kurtz, D., & Shawker, T. (1984). Characterization of tongue shape. Ultrasonic Imaging, 6, 37–47.
Munhall, K. G., Ostry, D. J., & Parush, A. (1985). Characteristics of velocity profiles of speech movements. Journal of Experimental Psychology: Human Perception and Performance, 11, 457–474.
Munhall, K. G., Vatikiotis-Bateson, E., & Tohkura, Y. (1995). X-ray film database for speech research [Videodisc]. Kyoto, Japan: ATR Laboratories.
Narayanan, S., Nayak, K., Lee, S., Sethy, A., & Byrd, D. (2004). An approach to real-time magnetic resonance imaging for speech production. Journal of the Acoustical Society of America, 115, 1771–1776.
Niitsu, M., Kumada, M., Campeau, N. G., Niimi, S., Riederer, S. J., & Itai, Y. (1994). Tongue displacement: Visualization with rapid tagged magnetization-prepared MR-imaging. Radiology, 191, 578–580.
Öhman, S. E. G. (1966). Coarticulation in VCV utterances: Spectographic measurements. Journal of the Acoustical Society of America, 39, 151–168.
Ostry, D. J., Keller, E., & Parush, A. (1983). Similarities in the control of speech articulators and the limbs: Kinematics of tongue dorsum movement in speech. Journal of Experimental Psychology: Human Perception and Performance, 9, 622–636.
Ostry, D. J., Vatikiotis-Bateson, E., & Gribble, P. L. (1997). An examination of the degrees of freedom of human jaw motion in speech and mastication. Journal of Speech, Language, and Hearing Research, 40, 1341–1351.
Parush, A., Ostry, D. J., & Munhall, K. G. (1983). A kinematic study of lingual coarticulation in VCV sequences. Journal of the Acoustical Society of America, 74, 1115–1125.
Perkell, J. S. (1969). Physiology of speech production: Results and implications of a quantitative cineradiographic study. Cambridge, MA: MIT Press.
Perkell, J. S., Cohen, M. H., Svirsky, M.A., Matthies, M.L., Garabieta, I., & Jackson, M. T. T. (1992). Electromagnetic midsagittal articulometer systems for transducing speech articulatory movements. Journal of the Acoustical Society of America, 92, 3078–3096.
Perkell, J. S., Zandipour, M., Matthies, M. L., & Lane, H. (2002). Economy of effort in different speaking conditions. I. A preliminary study of intersubject differences and modeling issues. Journal of the Acoustical Society of America, 112, 1627–1641.
Pouplier, M., & Goldstein, L. M. (2002). Asymmetries in speech errors: Production, perception and the question of underspecification. In T. A. Hall, B. Pompino-Marschall, & M. Rochon (Eds.), Papers on phonetics and phonology: The articulation, acoustics and perception of consonants (pp. 73–82). Berlin, Germany: ZAS.
Rochette, C. (1973). Les groupes de consonnes en français. Québec, Canada: Les Presses de l’Université Laval.
Rokkaku, M., Hashimoto, K., Imaizumi, S., Niimi, S., & Kiritani, S. (1986). Measurement of the three-dimensional shape of the vocal tract based on the magnetic resonance imaging technique. Annual Bulletin RILP, 20, 47–54.
Rubin, P. E., Baer, T., & Mermelstein, P. (1981). An articulatory synthesizer for perceptual research. Journal of the Acoustical Society of America, 70, 321–328.
Rubin, P. E., Saltzman, E., Goldstein, L., McGowan, R., Tiede, M., & Browman, C. (1996, May). CASY and extensions to the task-dynamic model. Paper presented at the 1st ESCA Tutorial and Research Workshop on Speech Production Modeling and 4th Speech Production Seminar, Autrans, France.
Russell, G. O. (1928). The vowel: Its physiological mechanism as shown by x-ray. Columbus: Ohio State University Press.
Schönle, P., Grabe, K., Wenig, P., Hohne, J., Schrader, J., & Conrad, B. (1987). Electromagnetic articulography: Use of alternating magnetic fields for tracking movements of multiple points inside and outside the vocal tract. Brain and Language, 31, 26–35.
Stark, J., Ericsdotter, C., Branderud, P., Sundberg, J., Lundberg, H.-J., & Lander, J. (1999). The APEX model as a tool in the specification of speaker specific articulatory behavior. In J. J. Ohala, Y. Hasegawa, M. Ohala, D. Granville, & A.C. Bailey (Eds.), Proceedings of the 14th International Congress of Phonetic Sciences (pp. 2279–2282). San Francisco: University of California, Berkeley.
Stevens, K. N., & Öhman, S. E. G. (1963). Cineradiographic studies of speech. KTH STL-QPSR, 2, 9–11.
Stone, M. L. (1997). Laboratory techniques for investigating speech articulation. In W. J. Hardcastle & J. Laver (Eds.), The handbook of phonetic sciences (pp. 11–32). Oxford, England: Blackwell.
Stone, M. L., & Davis, E. P. (1995). A head and transducer support system for making ultrasound images of tongue/ jaw movement. Journal of the Acoustical Society of America, 98, 3107–3112.
Stone, M. L., Davis, E. P., Douglas, A. S., Aiver, M. N., Gullapalli, R., Levine, W. S., et al. (2001). Modeling tongue surface contours from cine-MRI images. Journal of Speech, Language, and Hearing Research, 44, 1026–1040.
Stone, M. L., Epstein, M., & Iskarous, K. (2004). Functional segments in tongue movement. Clinical Linguistics and Phonetics, 18, 507–521.
Stone, M. L., Faber, A., Raphael, L. J., & Shawker, T. H. (1992). Cross-sectional tongue shape and linguopalatal contact patterns in [s], [S], and [l]. Journal of Phonetics, 20, 253–270.
Stone, M. L., & Lundberg, A. (1996). Three-dimensional tongue surface shapes of English consonants and vowels. Journal of the Acoustical Society of America, 99, 3728–3737.
Stone, M. L., Sonies, B. C., Shawker, T. H., Weiss, G., & Nadel, L. (1983). Analysis of real-time ultrasound images of tongue configuration using a grid-digitizing system. Journal of Phonetics, 11, 207–218.
Stutley, J., Cooke, J., & Parsons, C. (1989). Normal CT anatomy of the tongue, floor of mouth and oropharynx. Clinical Radiology, 40, 248–253.
van Lieshout, P. H. H. M., Alfonso, P. J., Hulstijn, W., & Peters, H. F. M. (1993). Electromagnetic articulography (EMA) in stuttering research. Institut für Phonetik und sprachliche Kommunikation der Universität München — Forschungsberichte, 31, 215–224.
Vatikiotis-Bateson, E., & Ostry, D. J. (1995). An analysis of the dimensionality of jaw motion in speech. Journal of Phonetics, 23, 101–117.
Weismer, G., & Bunton, K. (1999). Influences of pellet markers on speech production behavior: Acoustical and perceptual measures. Journal of the Acoustical Society of America, 105, 2882–2894.
Weismer, G., Yunusova, Y., & Westbury, J. R. (2003). Interarticulator coordination in dysarthria: An X-ray microbeam study. Journal of Speech, Language, and Hearing Research, 46, 1247–1261.
Westbury, J. R. (1994a). On coordinate systems and the representation of articulatory movements. Journal of the Acoustical Society of America, 95, 2271–2273.
Westbury, J. R. (1994b). X-ray microbeam speech production database user’s handbook [Software manual]. Madison: University of Wisconsin—Madison, Waisman Research Center.
Whalen, D. H., Kang, A. M., Magen, H., Fulbright, R. K., & Gore, J. C. (1999). Predicting pharynx shape from tongue position during vowel production. Journal of Speech, Language, and Hearing Research, 42, 592–603.
Wood, S. (1982). X-ray and model studies of vowel articulation (Working Paper No. 23). Lund, Sweden: Lund University, Department of Linguistics.
Wrench, A. A., & Scobbie, J. M. (2003). Categorising vocalisation of English /l/ using EPG, EMA and ultrasound. In S. Palethorpe & M. Tabain (Eds.), Proceedings of the 6th International Seminar on Speech Production (pp. 314–319). Sydney, Australia: Macquarie University.

Information & Authors


Published In

Journal of Speech, Language, and Hearing Research
Volume 48Number 3June 2005
Pages: 543-553


  • Received: May 8, 2004
  • Accepted: Oct 22, 2004
  • Published in issue: Jun 1, 2005


Request permissions for this article.


  1. speech production
  2. ultrasound
  3. tongue measurement
  4. kinematics
  5. boundary detection



D. H. Whalen [email protected]
Haskins Laboratories, New Haven, CT
Khalil Iskarous
Haskins Laboratories, New Haven, CT
Mark K. Tiede
Haskins Laboratories, New Haven, CT, and Massachusetts Institute of Technology, Cambridge
David J. Ostry
Haskins Laboratories, New Haven, CT, and McGill University, Montreal, Quebec, Canada
Heike Lehnert-LeHouillier
Haskins Laboratories, New Haven, CT, and University at Buffalo, Buffalo, NY
Eric Vatikiotis-Bateson
University of British Columbia, Vancouver, British Columbia, Canada
Donald S. Hailey
Haskins Laboratories, New Haven, CT


Contact author: D. H. Whalen, Haskins Laboratories, 300 George Street, New Haven, CT 06511. E-mail: [email protected]

Metrics & Citations


Article Metrics
View all metrics


If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

For more information or tips please see 'Downloading to a citation manager' in the Help menu.

Citing Literature

  • What R Mandarin Chinese /ɹ/s? – acoustic and articulatory features of Mandarin Chinese rhotics, Phonetica, 10.1515/phon-2023-0023, 81, 5, (509-552), (2024).
  • Production of the English /ɹ/ by Mandarin–English Bilingual Speakers, Language and Speech, 10.1177/00238309241230895, (2024).
  • A systematic review of the application of machine learning techniques to ultrasound tongue imaging analysis, The Journal of the Acoustical Society of America, 10.1121/10.0028610, 156, 3, (1796-1819), (2024).
  • The relation between perceptual retuning and articulatory restructuring: Individual differences in accommodating a novel phonetic variant, Journal of Phonetics, 10.1016/j.wocn.2024.101352, 107, (101352), (2024).
  • Assessing ultrasound probe stabilization for quantifying speech production contrasts using the Adjustable Laboratory Probe Holder for UltraSound (ALPHUS), Journal of Phonetics, 10.1016/j.wocn.2024.101339, 105, (101339), (2024).
  • Instrumental Analysis of Speech Production, The Handbook of Clinical Linguistics, Second Edition, 10.1002/9781119875949.ch34, (489-504), (2024).
  • Interarticulator Speech Coordination: Timing Is of the Essence, Journal of Speech, Language, and Hearing Research, 10.1044/2022_JSLHR-22-00594, 66, 3, (901-915), (2023).
  • Tongue Postures and Tongue Centers: A Study of Acoustic-Articulatory Correspondences Across Different Head Angles, Frontiers in Psychology, 10.3389/fpsyg.2021.768754, 12, (2022).
  • An investigation of interference between electromagnetic articulography and electroglottography, JASA Express Letters, 10.1121/10.0014033, 2, 9, (2022).
  • Comparing metrics for quantification of children’s tongue shape complexity using ultrasound imaging, Clinical Linguistics & Phonetics, 10.1080/02699206.2022.2039300, 37, 2, (169-195), (2022).

View Options

Sign In Options

ASHA member? If so, log in with your ASHA website credentials for full access.

Member Login

View options


View PDF

Full Text

View Full Text






Copy the content Link
