Abstract
Purpose
As increasing amounts and types of speech data become accessible, health care and technology industries increasingly demand quantitative insight into speech content. The potential for speech data to provide insight into cognitive, affective, and psychological health states and behavior crucially depends on the ability to integrate speech data into the scientific process. Current engineering methods for acquiring, analyzing, and modeling speech data present the opportunity to integrate speech data into the scientific process. Additionally, machine learning systems recognize patterns in data that can facilitate hypothesis generation, data analysis, and statistical modeling. The goals of the present article are (a) to review developments across these domains that have allowed real-time magnetic resonance imaging to shed light on aspects of atypical speech articulation; (b) in a parallel vein, to discuss how advancements in signal processing have allowed for an improved understanding of communication markers associated with autism spectrum disorder; and (c) to highlight the clinical significance and implications of the application of these technological advancements to each of these areas.
Conclusion
The collaboration of engineers, speech scientists, and clinicians has resulted in (a) the development of biologically inspired technology that has been proven useful for both small- and large-scale analyses, (b) a deepened practical and theoretical understanding of both typical and impaired speech production, and (c) the establishment and enhancement of diagnostic and therapeutic tools, all having far-reaching, interdisciplinary significance.
Supplemental Material

References
- American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed.). Washington, DC: Author.
-
Bargmann, C. I., & Gilliam, C. T. (2013). Genes and behavior.InE. R. Kandel, J. H. Schwartz, T. M. Jessell, S. A. Siegelbaum, & A. J. Hudspeth (Eds.), Principles of neural science (5th ed., pp. 39–65). New York, NY: McGraw-Hill. -
Bone, D., Bishop, S. L., Black, M. P., Goodwin, M. S., Lord, C., & Narayanan, S. S. (2016). Use of machine learning to improve autism screening and diagnostic instruments: Effectiveness, efficiency, and multi-instrument fusion.The Journal of Child Psychology and Psychiatry, 57(8), 927–937. https://doi.org/10.1111/jcpp.12559 -
Bone, D., Black, M. P., Ramakrishna, A., Grossman, R., & Narayanan, S. (2015). Acoustic–prosodic correlates of “awkward” prosody in story retellings from adolescents with autism.Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2015, Dresden, Germany, 1616–1620. https://www.isca-speech.org/archive/interspeech_2015/papers/i15_1616.pdf -
Bone, D., Goodwin, M. S., Black, M. P., Lee, C.-C., Audhkhasi, K., & Narayanan, S. (2015). Applying machine learning to facilitate autism diagnostics: Pitfalls and promises.Journal of Autism and Developmental Disorders, 45(5), 1121–1136. https://doi.org/10.1007/s10803-014-2268-6 -
Bone, D., Lee, C.-C., Black, M. P., Williams, M. E., Lee, S., Levitt, P., & Narayanan, S. (2014). The psychologist as an interlocutor in autism spectrum disorder assessment: Insights from a study of spontaneous prosody.Journal of Speech, Language, and Hearing Research, 57(4), 1162–1177. https://doi.org/10.1044/2014_JSLHR-S-13-0062 -
Bone, D., Lee, C.-C., Chaspari, T., Gibson, J., & Narayanan, S. (2017). Signal processing and machine learning for mental health research and clinical applications [Perspectives].The Institute of Electrical and Electronics Engineers Signal Processing Magazine, 34(5), 195–196. https://doi.org/10.1109/MSP.2017.2718581 -
Bone, D., Lee, C.-C., Potamianos, A., & Narayanan, S. (2014, September). An investigation of vocal arousal dynamics in child–psychologist interactions using synchrony measures and a conversation-based model.Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Singapore, 218–222. https://www.isca-speech.org/archive/interspeech_2014/i14_0218.html -
Bone, D., Mertens, J., Zane, E., Lee, S., Narayanan, S. S., & Grossman, R. (2017). Acoustic–prosodic and physiological response to stressful interactions in children with autism spectrum disorder.Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Stockholm, Sweden, 147–151. https://doi.org/10.21437/Interspeech.2017-179 -
Bresch, E., Kim, Y. C., Nayak, K., Byrd, D., & Narayanan, S. (2008). Seeing speech: Capturing vocal tract shaping using real-time magnetic resonance imaging [Exploratory DSP].The Institute of Electrical and Electronics Engineers Signal Processing Magazine, 25(3), 123–132. -
Bresch, E., & Narayanan, S. S. (2009). Region segmentation in the frequency domain applied to upper airway real-time magnetic resonance images.The Institute of Electrical and Electronics Engineers Transactions on Medical Imaging, 28(3), 323–338. -
Bresch, E., Nielsen, J., Nayak, K., & Narayanan, S. (2006). Synchronized and noise-robust audio recordings during realtime magnetic resonance imaging scans.The Journal of the Acoustical Society of America, 120(4), 1791–1794. -
Byrd, D., Tobin, S., Bresch, E., & Narayanan, S. (2009). Timing effects of syllable structure and stress on nasals: A real-time MRI examination.Journal of Phonetics, 37(1), 97–110. -
Carignan, C., Shosted, R., Fu, M., Liang, Z., & Sutton, B. (2015). A real-time MRI investigation of the role of lingual and pharyngeal articulation in the production of the nasal vowel system of French.Journal of Phonetics, 50, 34–51. -
Eyben, F., Wöllmer, M., & Schuller, B. (2010). OpenSMILE—The Munich versatile and fast open-source audio feature extractor.Proceedings ACM Multimedia (MM), ACM (pp. 1459–1462). Florence, Italy: Associated Colleges of the Midwest. -
Feng, X., Blemker, S., Inouye, J., Pelland, C., Zhao, L., & Meyer, C. (2018). Assessment of velopharyngeal function with dual-planar high-resolution real-time spiral dynamic MRI.Magnetic Resonance in Medicine, 80(4), 1467–1474. -
Fernández-Baena, A., Susín, A., & Lligadas, X. (2012). Biomechanical validation of upper-body and lower-body joint movements of kinect motion capture data for rehabilitation treatments.Proceedings of the 2012 4th International Conference on Intelligent Networking and Collaborative Systems, INCoS (pp. 656–661). https://doi.org/10.1109/iNCoS.2012.66 -
Gelfer, M. P. (1988). Perceptual attributes of voice: Development and use of rating scales.Journal of Voice, 2(4), 320–326. https://doi.org/10.1016/S0892-1997(88)80024-9 -
Geschwind, D. H. (2011). Genetics of autism spectrum disorders.Trends in Cognitive Sciences, 15(9), 409–416. https://doi.org/10.1016/j.tics.2011.07.003 -
Guha, T., Yang, Z., Grossman, R. B., & Narayanan, S. S. (2018). A computational study of expressive facial dynamics in children with autism.The Institute of Electrical and Electronics Engineers Transactions on Affective Computing, 9(1), 14–20. https://doi.org/10.1109/TAFFC.2016.2578316 -
Guha, T., Yang, Z., Ramakrishna, A., Grossman, R. B., Darren, H., Lee, S., & Narayanan, S. S. (2015). On quantifying facial expression-related atypicality of children with autism spectrum disorder.Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (p. 803). https://doi.org/10.1109/ICASSP.2015.7178080 -
Gupta, R., Bone, D., Lee, S., & Narayanan, S. (2016). Analysis of engagement behavior in children during dyadic interactions using prosodic cues.Computer Speech and Language, 37, 47–66. https://doi.org/10.1016/j.csl.2015.09.003 -
Hagedorn, C., Kim, K., Zu, Y., Goldstein, L., Sinha, U., & Narayanan, S. (2017). Quantifying lingual flexibility in post-glossectomy speech. American Speech-Language-Hearing Association Convention, Los Angeles, CA. -
Hagedorn, C., Lammert, A., Bassily, M., Zu, Y., Sinha, U., Goldstein, L., & Narayanan, S. (2014). Characterizing post-glossectomy speech using real-time MRI.Paper presented at the International Seminar on Speech Production ,Cologne, Germany . -
Hagedorn, C., Proctor, M., Goldstein, L., Wilson, S., Miller, B., Gorno-Tempini, M. L., & Narayanan, S. S. (2017). Characterizing articulation in apraxic speech using real-time magnetic resonance imaging.Journal of Speech, Language, and Hearing Research, 60(4), 877–891. -
Halberstam, B. (2004). Acoustic and perceptual parameters relating to connected speech are more reliable measures of hoarseness than parameters relating to sustained vowels.Journal for Oto-Rhino-Laryngology, 66(2), 70–73. https://doi.org/10.1159/000077798 -
Hammal, Z., Cohn, J. F., & Messinger, D. S. (2015). Head movement dynamics during play and perturbed mother-infant interaction.The Institute of Electrical and Electronics Engineers Transactions on Affective Computing, 6(4), 361–370. https://doi.org/10.1109/TAFFC.2015.2422702 -
Hillenbrand, J., Cleveland, R. A., & Erickson, R. L. (1994). Acoustic correlates of breathy vocal quality.Journal of Speech and Hearing Research, 37(4), 769–778. https://doi.org/10.1044/jshr.3704.769 -
Hubbard, K., & Trauner, D. A. (2007). Intonation and emotion in autistic spectrum disorders.Journal of Psycholinguistic Research, 36(2), 159–173. -
Jeni, L. A., Cohn, J. F., & Kanade, T. (2015). Dense 3D face alignment from 2D videos in real-time.In 2015 11th Institute of Electrical and Electronics Engineers International Conference and Workshops on Automatic Face and Gesture Recognition (FG) . https://doi.org/10.1109/FG.2015.7163142 -
Jones, W., Carr, K., & Klin, A. (2008). Absence of preferential looking to the eyes of approaching adults predicts level of social disability in 2-year-old toddlers with autism spectrum disorder.Archives of General Psychiatry, 65(8), 946–954. -
Kim, J., Kumar, N., Lee, S., & Narayanan, S. S. (2014). Enhanced airway-tissue boundary segmentation for real-time magnetic resonance imaging data.Paper presented at the International Seminar on Speech Production (ISSP) ,Cologne, Germany . -
Kim, Y., Narayanan, S. S., & Nayak, K. (2009). Accelerated 3D upper airway MRI using compressed sensing.Magnetic Resonance in Medicine, 61(6), 1434–1440. -
Klin, A., Jones, W., Schultz, R., Volkmar, F., & Cohen, D. (2002). Visual fixation patterns during viewing of naturalistic social situations as predictors of social competence in individuals with autism.Archives of General Psychiatry, 59(9), 809–816. -
Klin, A., Lin, D. J., Gorrindo, P., Ramsay, G., & Jones, W. (2009). Two-year-olds with autism orient to non-social contingencies rather than biological motion.Nature, 459(7244), 257–261. -
Kreiman, J., Gerratt, B. R., Kempster, G. B., Erman, A., & Berke, G. S. (1993). Perceptual evaluation of voice quality.Journal of Speech and Hearing Research, 36(1), 21–40. https://doi.org/10.1044/jshr.3601.21 -
Lammert, A., Goldstein, L., Ramanarayanan, V., & Narayanan, S. S. (2015). Gestural control in the English past-tense suffix: An articulatory study using real-time MRI.Phonetica, 71(4), 229–248. -
Lander-Portnoy, M., Goldstein, L., & Narayanan, S. (2017). Using real time magnetic resonance imaging to measure changes in articulatory behavior due to partial glossectomy.The Journal of the Acoustical Society of America, 142(4), 2641–2642. -
Lee, S., Potamianos, A., & Narayanan, S. (2014). Developmental acoustic study of American English diphthongs.The Journal of the Acoustical Society of America, 136(4), 1880–1894. -
Lee, Y., Goldstein, L., & Narayanan, S. S. (2015). Systematic variation in the articulation of the Korean liquid across prosodic positions.Proceedings of the International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK. -
Lingala, S., Zhu, Y., Kim, Y., Toutios, A., Narayanan, S. S., & Nayak, K. (2017). A fast and flexible MRI system for the study of dynamic vocal tract shaping.Magnetic Resonance in Medicine, 77(1), 112–125. - Lingwaves Theravox. (2014).
WEBOSYS [Computer tool] . Forchheim, Germany: WEBOSYS. -
Martin, K. B., Hammal, Z., Ren, G., Cohn, J. F., Cassell, J., Ogihara, M., … Messinger, D. S. (2018). Objective measurement of head movement differences in children with and without autism spectrum disorder.Molecular Autism, 9(1), 14. https://doi.org/10.1186/s13229-018-0198-4 -
McAllister, A., Sundberg, J., & Hibi, S. R. (1998). Acoustic measurements and perceptual evaluation of hoarseness in children’s voices.Logopedics Phoniatrics Vocology, 23(1), 27–38. https://doi.org/10.1080/140154398434310-1 -
McAllister Byun, T., Campbell, H., Carey, H., Liang, W., Park, T. H., & Svirsky, M. (2017). Enhancing intervention for residual rhotic errors via app-delivered biofeedback: A case study.Journal of Speech, Language, and Hearing Research, 60(6S), 1810–1817. -
McCann, J., & Peppé, S. (2003). Prosody in autism spectrum disorders: A critical review.International Journal of Language & Communication Disorders, 38(4), 325–350. -
McMicken, B., Salles, F., Von Berg, S., Vento-Wilson, M., Rogers, K., Toutios, A., & Narayanan, S. S. (2017). Bilabial substitution patterns during consonant production in a case of congenital aglossia.Journal of Communication Disorders, Deaf Studies and Hearing Aids, 5(2), 1–6. -
Mehta, D., Zañartu, M., Feng, S., Cheyne, H., II, & Hillman, R. (2012). Mobile voice health monitoring using a wearable accelerometer sensor and a smartphone platform.The Institute of Electrical and Electronics Engineers Transactions on Biomedical Engineering, 59(11), 3090–3096. -
Metallinou, A., Grossman, R. B., & Narayanan, S. (2013). Quantifying atypicality in affective facial expressions of children with autism spectrum disorders.Proceedings of the IEEE International Conference on Multimedia and Expo, San Jose, CA, USA, Sept. 26, 2013 (pp. 1–6). https://doi.org/10.1109/ICME.2013.6607640 -
Narayanan, S. S., Alwan, A. A., & Haker, K. (1995). An articulatory study of fricative consonants using magnetic resonance imaging.The Journal of the Acoustical Society of America, 98(3), 1325–1347. -
Narayanan, S. S., & Georgiou, P. G. (2013). Behavioral signal processing: Deriving human behavioral informatics from speech and language.Proceedings of the Institute of Electrical and Electronics Engineers, 101(5), 1203–1233. https://doi.org/10.1109/JPROC.2012.2236291 -
Narayanan, S. S., Nayak, K., Lee, S., Sethy, A., & Byrd, D. (2004). An approach to real-time magnetic resonance imaging for speech production.The Journal of the Acoustical Society of America, 115(4), 1771–1776. -
Narayanan, S. S., Toutios, A., Ramanarayanan, V., Lammert, A., Kim, J., Lee, S., … Proctor, M. (2014). Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research (TC).The Journal of the Acoustical Society of America, 136(3), 1307–1311. -
Nayak, K., & Hu, B. (2005). The future of real-time cardiac magnetic resonance imaging.Current Cardiology Reports, 7, 45–51. -
Perry, J., Kuehn, D., Sutton, B., & Fang, X. (2017). Velopharyngeal structural and functional assessment of speech in young children using dynamic magnetic resonance imaging.The Cleft Palate–Craniofacial Journal, 54(4), 408–422. -
Proctor, M., Bone, D., Katsamanis, A., & Narayanan, S. S. (2010). Rapid semi-automatic segmentation of real-time magnetic resonance images for parametric vocal tract analysis.Paper presented at the INTERSPEECH 2010 11th Annual Conference of the International Speech Communication Association ,Makuhari, Japan . -
Proctor, M., Lo, C., & Narayanan, S. S. (2015). Articulation of English vowels in running speech: A real-time MRI study.Proceedings of International Congress of Phonetic Sciences (ICPhS 2015), Glasgow, UK. -
Shic, F., Macari, S., & Chawarska, K. (2014). Speech disturbs face scanning in 6-month-old infants who develop autism spectrum disorder.Biological Psychiatry, 75(3), 231–237. https://doi.org/10.1016/j.biopsych.2013.07.009 -
Stone, M., Davis, E. P., Douglas, A. S., NessAiver, M., Gullapalli, R., Levine, W. S., & Lundberg, A. (2001). Modeling the motion of the internal tongue from tagged cine-MRI images.The Journal of the Acoustical Society of America, 109(6), 2974–2982. -
Story, B. H., Titze, I. R., & Hoffman, E. A. (1996). Vocal tract area functions from magnetic resonance imaging.The Journal of the Acoustical Society of America, 100(1), 537–554. -
Takemoto, H., Honda, K., Masaki, S., Shimada, Y., & Fujimoto, I. (2006). Measurement of temporal changes in vocal tract area function from 3D cine-MRI data.The Journal of the Acoustical Society of America, 119(2), 1037–1049. -
Tick, B., Bolton, P., Happé, F., Rutter, M., & Rijsdijk, F. (2016). Heritability of autism spectrum disorders: A meta-analysis of twin studies.The Journal of Child Psychology and Psychiatry, 57(5), 585–595. https://doi.org/10.1111/jcpp.12499 -
Toutios, A., Byrd, D., Goldstein, L., & Narayanan, S. (2017). Articulatory compensation strategies employed by an aglossic speaker.The Journal of the Acoustical Society of America, 142(4), 2639. -
Toutios, A., Lingala, S., Vaz, C., Kim, J., Esling, J., Keating, P., … Narayanan, S. (2016). Illustrating the production of the International Phonetic Alphabet sounds using fast real-time magnetic resonance imaging.Proceedings of Interspeech 2016, 2428–2432. -
Vaz, C., Ramanarayanan, V., & Narayanan, S. (2018). Acoustic denoising using dictionary learning with spectral and temporal regularization.IEEE/ACM Transactions on Audio, Speech and Language Processing, 26(5), 967–680. -
Wagner, P., Malisz, Z., & Kopp, S. (2014). Gesture and speech in interaction: An overview.Speech Communication, 57, 209–232. -
Windolf, M., Götzen, N., & Morlock, M. (2008). Systematic accuracy and precision analysis of video motion capturing systems-exemplified on the Vicon-460 system.Journal of Biomechanics, 41(12), 2776–2780. https://doi.org/10.1016/j.jbiomech.2008.06.024 -
Zhang, Z., Gersdorff, N., & Frahm, J. (2011). Real-time magnetic resonance imaging of temporomandibular joint dynamics.The Open Medical Imaging Journal, 5, 1–7.