Structure, Movement, Sound, and Perception

    Models that take the form of artificial talkers and speech synthesis systems have long been used as a means of understanding both speech production and speech perception. The article begins with a brief history of two artificial speaking devices that exemplify the representation of speech production as a system of modulations. The development of a recent airway modulation model is then described that simulates the time-varying changes of the vocal tract and acoustic wave propagation. The result is a type of artificial talker that can be used to study various aspects of how sound is generated by humans and how that sound is perceived by a listener.

    References

    • Båvegård, M. (1995). Introducing a parametric consonantal model to the articulatory speech synthesizer.In J. M. Pardo (Ed.) Proceedings Eurospeech '95 (pp. 1857–1860). Madrid, Spain: European Speech Communication Association.
    • Berry, J. (2003). Second formant effects of low dimensional models of speaking rate change in articulatory synthesis (Ph.D. Dissertation). University of Wisconsin-Madison.
    • Birkholz, P. (2013). Modeling consonant-vowel coarticulation for articulatory speech synthesis.PloS one, 8(4), e60603. doi:10.1371/journal.pone.0060603
    • Brewster, D. (1883). Letters on natural magic. London, England: Chatto and Windus.
    • Carré, R., & Chennoukh, S. (1995). Vowel-consonant-vowel modeling by superposition of consonant closure on vowel-to-vowel gestures.Journal of Phonetics, 23, 231–241.
    • Coker, C. H. (1976). A model of articulatory dynamics and control, Proc.IEEE, 64, 452–460.
    • Dang, J., & Honda, K. (2004). Construction and control of a physiological articulatory model.Journal of the Acoustical Society of America, 115, 853–870.
    • Dudley, H. (1940). The carrier nature of speech.Bell System Technical Journal, 19(4) 495–515.
    • Dudley, H., Riesz, R. R., & Watkins, S. S. A. (1939). A synthetic speaker.Journal of the Franklin Institute, 227(6), 739–764.
    • Dudley, H., & Tarnoczy, T. H. (1950). The speaking machine of Wolfgang von Kempelen.Journal of the Acoustical Society of America, 22(2), 151–166.
    • Fant, G. (1960). Acoustic theory of speech production. The Hague, The Netherlands: Mouton.
    • Fels, S., Vogt, F., Van Den Doel, K., Lloyd, J., Stavness, I., & Vatikiotis-Bateson, E. (2006). Artisynth: A biomechanical simulation platform for the vocal tract and upper airway.In International Seminar on Speech Production, Ubatuba, Brazil; Citeseer.
    • Fujimura, O. (1992). Phonology and phonetics - A syllable based model of articulatory organization.Journal of the Acoustical Society of Japan, 13, 39–48.
    • Gracco, V. (1992). Characteristics of speech as a motor control system.Haskins Laboratories Status Report on Speech Research, SR-109/110, 13–26.
    • Guenther, F. H., Ghosh, S. S., & Tourville, J. A. (2006). Neural modeling and imaging of the cortical interactions underlying syllable production.Brain and Language, 96(3), 280–301.
    • Harshman, R., Ladefoged, P., & Goldstein, L. (1977). Factor analysis of tongue shapes.Journal of the Acoustical Society of America, 62(3), 693–707.
    • Ichikawa, A., & Nakata, K. (1968). Speech synthesis by rule.In Y. Kohasi (Ed.) Reports of the 6th International Congress on Acoustics (pp. 171–1744). Tokyo, Japan: International Council of Scientific Unions.
    • Kröger, B. J., Birkholz, P., Lowit, A., & Neuschaefer-Rube, C. (2010). Phonemic, sensory, and motor representations in an action-based neurocomputational model of speech production.In B. Maassen, & P. van Lieshout (Eds.) Speech motor control: New developments in basic and applied research (p. 23–36). Oxford, England: Oxford University Press.
    • Liljencrants, J. (1985). Speech synthesis with a reflection-type line analog. (DS Dissertation, Royal Institute of Technology, Stockholm, Sweden).
    • Lindblöm, B., & Sundberg, J. (1971). Acoustical consequences of lip, tongue, jaw, and larynx movement.Journal of the Acoustical Society of America, 4, 1166–1179.
    • Lindsay, D. (1997). Talking head.American Heritage Invention & Technology, 13(1), 57–63.
    • Maeda, S. (1990). Compensatory articulation during speech: Evidence from the analysis and synthesis of vocal tract shapes using an articulatory model.In W. L. Hardcastle, & A. Marcha (Eds.), Speech production and speech modeling (pp. 131–149). Dordrecht, The Netherlands: Kluwer Academic Publishers.
    • Maeda, S. (1991). On articulatory and acoustic variabilities.Journal of Phonetics, 19, 321–331.
    • Mattingly, I. G. (1974). Speech synthesis for phonetic and phonological models.In T. A. Sebeok (Ed.), Current trends in linguistics (pp. 2451–2487). The Hague, The Netherlands: Mouton.
    • Mermelstein, P. (1967). Determination of the vocal-tract shape from measured formant frequencies.Journal of the Acoustical Society of America, 41, 1283–1294.
    • Mermelstein, P. (1973). Articulatory model for the study of speech production.Journal of the Acoustical Society of America, 53, 1070–1082.
    • Mokhtari, P., Kitamura, T., Takemoto, H., & Honda, K. (2007). Principal components of vocal tract area functions and inversion of vowels by linear regression of cepstrum coefficients.Journal of Phonetics, 35, 20–39.
    • Mrayati, M., Carré, R., & Guérin, B. (1988). Distinctive regions and modes: A new theory of speech production.Speech Communication, 7, 257–286.
    • Nakata, K., & Mitsuoka, T. (1965). Phonemic transformation and control aspects of synthesis of connected speech.Journal of the Radio Research Laboratories, 12, 171–186.
    • Nix, D. A., Papcun, G., Hogden, J., & Zlokarnik, I. (1996). Two cross-linguistic factors underlying tongue shapes for vowels.Journal of the Acoustical Society of America, 99, 3707–3717.
    • Öhman, S. E. G. (1966). Coarticulation in VCV utterances: Spectrographic measurements.Journal of the Acoustical Society of America, 39, 151–168.
    • Öhman, S. E. G. (1967). Numerical model of coarticulation.Journal of the Acoustical Society of America, 41, 310–320.
    • Perkell, J. S. (1969). Physiology of speech production: Results and implications of a quantitative cineradiographic study. Cambridge, MA: The MIT Press.
    • Pieraccini, R. (2012). The voice in the machine. Cambridge, MA: MIT Press.
    • Rothenberg, M., Dorman, K. W., Rumm, J. C., & Theerman, P. H. (1992). The papers of Joseph Henry: Volume 6. Washington, DC: Smithsonian Institution Press.
    • Schroeder, M. R. (1967). Determination of the geometry of the human vocal tract by acoustic measurements.Journal of the Acoustical Society of America, 41, 1002–1010.
    • Scully, C. (1990). Articulatory synthesis.In W. J. Hardcastle, & A. Marchal (Eds.), Speech production and speech modeling (pp. 151–186). Dordrecht, The Netherlands: Kluwer Academic Publishers.
    • Shirai, K., & Honda, M. (1977). Estimation of articulatory motion.In M. Sawashima, & F. Cooper (Eds.), Dynamic aspects of speech production (pp. 279–302). Tokyo, Japan: University of Tokyo Press.
    • Stevens, K. N. (1998). Acoustic phonetics. Cambridge, MA: MIT Press.
    • Stevens, K. N., & House, A. S. (1955). Development of a quantitative description of vowel articulation.Journal of the Acoustical Society of America, 27(3), 484–493.
    • Story, B. H., & Titze, I. R. (1998). Parameterization of vocal tract area functions by empirical orthogonal modes.Journal of Phonetics, 26(3), 223–260.
    • Story, B. H., Titze, I. R., & Hoffman, E. A. (2001). The relationship of vocal tract shape to three voice qualities.Journal of the Acoustical Society of America, 109, 1651–1667.
    • Story, B. H. (1995). Physiologically-based speech simulation using an enhanced wave-reflection model of the vocal tract (Ph. D. Dissertation, University of Iowa).
    • Story, B. H. (2005a). A parametric model of the vocal tract area function for vowel and consonant simulation.Journal of the Acoustical Society of America, 117, 3231–3254.
    • Story, B. H. (2005b). Synergistic modes of vocal tract articulation for American English vowels.Journal of the Acoustical Society of America, 118, 3834–3859.
    • Story, B. H. (2007). A comparison of vocal tract perturbation patterns based on statistical and acoustic considerations.Journal of the Acoustical Society of America, 122, EL107–EL114.
    • Story, B. H. (2009). Vocal tract modes based on multiple area function sets from one speaker.Journal of the Acoustical Society of America, 125, EL141–EL147.
    • Story, B. H., & Bunton, K. (2010). Relation of vocal tract shape, formant transitions, and stop consonant identification.Journal of Speech, Language, and Hearing Research, 53, 1514–1528.
    • Story, B. H., & Titze, I. R. (2002). A preliminary study of voice quality transformation based on modifications to the neutral vocal tract area function.Journal of Phonetics, 30, 485–509.
    • Titze, I. R. (1984). Parameterization of the glottal area, glottal flow, and vocal fold contact area.Journal of the Acoustical Society of America, 75, 570–580.
    • Titze, I. R. (2002). Regulating glottal airflow in phonation: Application of the maximum power transfer theorem to a low dimensional phonation model.Journal of the Acoustical Society of America, 111, 367–376.
    • Titze, I. R. (2006). The myoelastic aerodynamic theory of phonation. Iowa City, IA: National Center for Voice and Speech.
    • Traunmüller, H. (1994). Conventional, biological and environmental factors in speech communication: A modulation theory.Phonetica, 51, 170–183.
    • Zheng, Y., Hasegawa-Johnson, M., & Pizza, S. (2003). Analysis of the three-dimensional tongue shape using a three-factor analysis model.Journal of the Acoustical Society of America, 113, 478–486.

    Additional Resources