Abstract
Purpose
The purpose of this study was to examine relationships between prosodic speech cues and autism spectrum disorder (ASD) severity, hypothesizing a mutually interactive relationship between the speech characteristics of the psychologist and the child. The authors objectively quantified acoustic-prosodic cues of the psychologist and of the child with ASD during spontaneous interaction, establishing a methodology for future large-sample analysis.
Method
Speech acoustic-prosodic features were semiautomatically derived from segments of semistructured interviews (Autism Diagnostic Observation Schedule, ADOS; Lord, Rutter, DiLavore, & Risi, 1999; Lord et al., 2012) with 28 children who had previously been diagnosed with ASD. Prosody was quantified in terms of intonation, volume, rate, and voice quality. Research hypotheses were tested via correlation as well as hierarchical and predictive regression between ADOS severity and prosodic cues.
Results
Automatically extracted speech features demonstrated prosodic characteristics of dyadic interactions. As rated ASD severity increased, both the psychologist and the child demonstrated effects for turn-end pitch slope, and both spoke with atypical voice quality. The psychologist's acoustic cues predicted the child's symptom severity better than did the child's acoustic cues.
Conclusion
The psychologist, acting as evaluator and interlocutor, was shown to adjust his or her behavior in predictable ways based on the child's social-communicative impairments. The results support future study of speech prosody of both interaction partners during spontaneous conversation, while using automatic computational methods that allow for scalable analysis on much larger corpora.

References
- American Psychiatric Association. (2000). Diagnostic and statistical manual of mental disorders (4th ed., text rev.). Washington, DC: Author.
- American Speech-Language-Hearing Association. (2007a). Childhood apraxia of speech (Position statement). Retrieved from www.asha.org/policy/PS2007-00277.htm
- American Speech-Language-Hearing Association. (2007b). Childhood apraxia of speech (Technical report). Retrieved from www.asha.org/policy/TR2007-00278.htm
-
Bachorowski, J. A., & Owren, M. J. (1995). Vocal expression of emotion: Acoustic properties of speech are associated with emotional intensity and context.Psychological Science, 6, 219–224. -
Baltaxe, C., Simmons, J. Q., & Zee, E. (1984). Intonation patterns in normal, autistic, and aphasic children.Proceedings of the Tenth International Congress of Phonetic Sciences (pp. 713–718. -
Baron-Cohen, S. (1988). Social and pragmatic deficits in autism: Cognitive or affective?.Journal of Autism and Developmental Disorders, 18, 379–402. -
Black, M. P., Bone, D., Williams, M. E., Gorrindo, P., Levitt, P., & Narayanan, S. S. (2011). The USC CARE Corpus: Child-psychologist interactions of children with autism spectrum disorders.Proceedings of Interspeech 2011. 1497–1500. -
Boersma, P. (2001). Praat: A system for doing phonetics by computer.Glot International, 5, 341–345. -
Bone, D., Black, M. P., Lee, C. C., Williams, M. E., Levitt, P., Lee, S., & Narayanan, S. (2012). Spontaneous-speech acoustic-prosodic features of children with autism and the interacting psychologist.Proceedings of Interspeech 2012. 1043–1046. -
Bone, D., Lee, C. C., Chaspari, T., Black, M. P., Williams, M. E., Lee, S., … Narayanan, S. (2013). Acoustic-prosodic, turn-taking, and language cues in child–psychologist interactions for varying social demand.Proceedings of Interspeech 2013. 2400–2404. -
Bone, D., Lee, C. C., & Narayanan, S. (2012). A robust unsupervised arousal rating framework using prosody with cross-corpus evaluation.Proceedings of Interspeech 2012. 1175–1178. -
Boucher, M. J., Andrianopoulos, M. V., Velleman, S. L., Keller, L. A., & Pecora, L. (2011, November). Assessing vocal characteristics of spontaneous speech in children with autism.Paper presented at the American Speech-Language-Hearing Association Convention, San Diego, CA . -
Busso, C., Lee, S., & Narayanan, S. (2009). Analysis of emotionally salient aspects of fundamental frequency for emotion detection.IEEE Transactions on Audio, Speech, and Language Processing, 17, 582–596. -
Cruttenden, A. (1997). Intonation. Cambridge, United Kingdom: Cambridge University Press. -
Dawson, G., Rogers, S., Munson, J., Smith, M., Winter, J., Greenson, J., … Varley, J. (2010). Randomized, controlled trial of an intervention for toddlers with autism: The Early Start Denver Model.Pediatrics, 125, e17–e34. -
Diehl, J. J., Watson, D., Bennetto, L., McDonough, J., & Gunlogson, C. (2009). An acoustic analysis of prosody in high-functioning autism.Applied Psycholinguistics, 30, 385–404. -
Eyben, F., Wöllmer, M., & Schuller, B. (2010). OpenSMILE: The Munich versatile and fast open-source audio feature extractor.In Proceedings of the 18th International Conference on Multimedia. 1459–1462ACM Multimedia. -
Frith, U. (2001). Mind blindness and the brain in autism.Neuron, 32, 969–980. -
Frith, U., & Happé, F. (2005). Autism spectrum disorder.Current Biology, 15, R786–R790. -
Furrow, D. (1984). Young children's use of prosody.Journal of Child Language, 11, 203–213. -
García-Perez, R. M., Lee, A., & Hobson, R. P. (2007). On intersubjective engagement in autism: A controlled study of nonverbal aspects of communication.Journal of Autism and Developmental Disorders, 37, 1310–1322. -
Gelfer, M. P. (1988). Perceptual attributes of voice: Development and use of rating scales.Journal of Voice, 2, 320–326. -
Geschwind, D. H., Sowinski, J., Lord, C., Iversen, P., Shestack, J., Jones, P., … Spence, S. J. (2001). The Autism Genetic Resource Exchange: A resource for the study of autism and related neuropsychiatric conditions.American Journal of Human Genetics, 69, 463. -
Gotham, K., Pickles, A., & Lord, C. (2009). Standardizing ADOS scores for a measure of severity in autism spectrum disorders.Journal of Autism and Developmental Disorders, 39, 693–705. -
Gotham, K., Risi, S., Pickles, A., & Lord, C. (2007). The Autism Diagnostic Observation Schedule: Revised algorithms for improved diagnostic validity.Journal of Autism and Developmental Disorders, 37, 613–627. -
Halberstam, B. (2004). Acoustic and perceptual parameters relating to connected speech are more reliable measures of hoarseness than parameters relating to sustained vowels.ORL, 66, 70–73. -
Heman-Ackah, Y. D., Heuer, R. J., Michael, D. D., Ostrowski, R., Horman, M., Barody, M. M., … Sataloff, R. T. (2003). Cepstral peak prominence: A more reliable measure of dysphonia.Annals of Otology, Rhinology & Laryngology, 112, 324–333. -
Hillenbrand, J., Cleveland, R. A., & Erickson, R. L. (1994). Acoustic correlates of breathy vocal quality.Journal of Speech, Language, and Hearing Research, 37, 769–778. -
Hillenbrand, J., & Houde, R. A. (1996). Acoustic correlates of breathy vocal quality: Dysphonic voices and continuous speech.Journal of Speech, Language, and Hearing Research, 39, 311–321. -
Jones, C. D., & Schwartz, I. S. (2009). When asking questions is not enough: An observational study of social communication differences in high functioning children with autism.Journal of Autism and Developmental Disorders, 39, 432–443. -
Juslin, P. N., & Scherer, K. R. (2005). Vocal expression of affect.InJ. Harrigan, R. Rosenthal, & K. Scherer (Eds.), The new handbook of methods in nonverbal behavior research. 65–135). Oxford, United Kingdom: Oxford University Press. -
Katsamanis, A., Black, M. P., Georgiou, P. G., Goldstein, L., & Narayanan, S. (2011, January). SailAlign: Robust long speech-text alignment.Paper presented at the Workshop on New Tools and Methods for Very-Large-Scale Phonetics Research, University of Pennsylvania, Philadelphia, PA . -
Kimura, M., & Daibo, I. (2006). Interactional synchrony in conversations about emotional episodes: A measurement by “the between-participants pseudosynchrony experimental paradigm.”.Journal of Nonverbal Behavior, 30, 115–126. -
Knapp, M. L., & Hall, J. A. (2009). Nonverbal communication in human interaction. Belmont, CA: Wadsworth. -
Kreiman, J., Gerratt, B. R., Kempster, G. B., Erman, A., & Berke, G. S. (1993). Perceptual evaluation of voice quality: Review, tutorial, and a framework for future research.Journal of Speech, Language, and Hearing Research, 36, 21–40. -
Lee, C.-C., Katsamanis, A., Black, M. P., Baucom, B. R., Christensen, A., Georgiou, P. G., & Narayanan, S. S. (2014). Computing vocal entrainment: A signal-derived PCA-based quantification scheme with applications to affect analysis in married couple interactions.Computer Speech & Language, 28, 518–539. -
Lee, C.-C., Katsamanis, A., Georgiou, P. G., & Narayanan, S. S. (2012). Based on isolated saliency or causal integration? Toward a better understanding of human annotation process using multiple instance learning and sequential probability ratio.Proceedings of Interspeech 2012 (pp. 619–622. -
Li, X., Tao, J., Johnson, M. T., Soltis, J., Savage, A., Leong, K. M., & Newman, J. D. (2007, AprilStress and emotion classification using jitter and shimmer features.Proceedings of IEEE ICASSP. 1081–1084. -
Lord, C., & Jones, R. M. (2012). Annual research review: Rethinking the classification of autism spectrum disorders.Journal of Child Psychology and Psychiatry, 53, 490–509. -
Lord, C., Rutter, M., DiLavore, P. C., & Risi, S. 1999). Autism Diagnostic Observation Schedule. Los Angeles, CA: Western Psychological Services. -
Lord, C., Rutter, M., DiLavore, P. C., Risi, S., Gotham, K., & Bishop, S. L. (2012). Autism Diagnostic Observation Schedule (2nd ed.). Torrance, CA: Western Psychological Services. -
McAllister, A., Sundberg, J., & Hibi, S. R. (1998). Acoustic measurements and perceptual evaluation of hoarseness in children's voices.Logopedics Phonatrics Vocology, 23, 27. -
McCann, J., & Peppe, S. (2003). Prosody in autism spectrum disorders: A critical review.International Journal of Language & Communication Disorders, 38, 325–350. -
Miller, J. F., & Iglesias, A. (2008). Systematic Analysis of Language Transcripts (English and Spanish, Version 9) [Computer software].Madison, WI: University of Wisconsin—Madison, Waisman Center, Language Analysis Laboratory. -
Narayanan, S., & Georgiou, P. G. (2013). Behavioral signal processing: Deriving human behavioral informatics from speech and language.Proceedings of IEEE, 101, 1203–1233. -
Paccia, J. M., & Curcio, F. (1982). Language processing and forms of immediate echolalia in autistic children.Journal of Speech, Language, and Hearing Research, 25, 42–47. -
Paul, D. B., & Baker, J. M. (1992). The design for the Wall Street Journal–based CSR corpus.In Proceedings of the ACL Workshop on Speech and Natural Language. 357–362). Stroudsburg, PA: Association for Computational Linguistics. -
Paul, R., Augustyn, A., Klin, A., & Volkmar, F. R. (2005). Perception and production of prosody by speakers with autism spectrum disorders.Journal of Autism and Developmental Disorders, 35, 205–220. -
Paul, R., Shriberg, L. D., McSweeny, J., Cicchetti, D., Klin, A., & Volkmar, F. (2005). Brief report: Relations between prosodic performance and communication and socialization ratings in high functioning speakers with autism spectrum disorders.Journal of Autism and Developmental Disorders, 35, 861–869. -
Peppe, S. (2011). Assessment of prosodic ability in atypical populations, with special reference to high-functioning autism.InV. Stojanovik, & J. Setter (Eds.), Speech prosody in atypical populations: Assessment and remediation (pp. 1–23). Guildford, United Kingtom: J&R Press. -
Peppe, S., McCann, J., Gibbon, F., O'Hare, A., & Rutherford, M. (2007). Receptive and expressive prosodic ability in children with high-functioning autism.Journal of Speech, Language, and Hearing Research, 50, 1015–1028. -
Ploog, B. O., Banerjee, S., & Brooks, P. J. (2009). Attention to prosody (intonation) and context in children with autism and in typical children using spoken sentences in a computer game.Research in Autism Spectrum Disorders, 3, 743–758. -
Prizant, B. M., Wetherby, A. M., Rubin, M. S., & Laurent, A. C. (2003). The SCERTS model: A transactional, family-centered approach to enhancing communication and socioemotional abilities of children with autism spectrum disorder.Infants and Young Children, 16, 296–316. -
Pronovost, W., Wakstein, M. P., & Wakstein, D. J. (1966). A longitudinal study of the speech behavior and language comprehension of fourteen children diagnosed atypical or autistic.Exceptional Children, 33, 19–26. -
Rutter, M., LeCouteur, A., & Lord, C. 2003). Autism Diagnostic Interview—Revised manual. Los Angeles, CA: Western Psychological Services. -
Sheinkopf, S. J., Mundy, P., Oller, D. K., & Steffens, M. (2000). Vocal atypicalities of preverbal autistic children.Journal of Autism and Developmental Disorders, 30, 345–354. -
Shobaki, K., Hosom, J. P., & Cole, R. (2000). The OGI Kids' speech corpus and recognizers.Proceedings of ICLSP 2000, 4, 258–261. -
Shriberg, L. D., Austin, D., Lewis, B. A., McSweeny, J. L., & Wilson, D. L. (1997). The Speech Disorders Classification System (SDCS): Extensions and lifespan reference data.Journal of Speech, Language, and Hearing Research, 40, 723–740. -
Shriberg, L. D., Fourakis, M., Hall, S. D., Karlsson, H. B., Lohmeier, H. L., McSweeny, J. L., … Wilson, D. L. (2010). Extensions to the Speech Disorders Classification System (SDCS).Clinical Linguistics & Phonetics, 24, 795–824. -
Shriberg, L. D., Paul, R., Black, L. M., & van Santen, J. P. (2011). The hypothesis of apraxia of speech in children with autism spectrum disorder.Journal of Autism and Developmental Disorders, 41, 405–426. -
Shriberg, L. D., Paul, R., McSweeny, J. L., Klin, A., Cohen, D. J., & Volkmar, F. R. (2001). Speech and prosody characteristics of adolescents and adults with high-functioning autism and Asperger syndrome.Journal of Speech, Language, and Hearing Research, 44, 1097–1115. -
Shue, Y. L., Keating, P., Vicenik, C., & Yu, K. (2010). VoiceSauce: A program for voice analysis.Energy, 1(H2), H1–A1. -
Siller, M., & Sigman, M. (2002). The behaviors of parents of children with autism predict the subsequent development of their children's communication.Journal of Autism and Developmental Disorders, 32, 77–89. -
Sonmez, M. K., Heck, L., Weintraub, M., Shriberg, E., Kemal, M., Larry, S., … Shriberg, W. E. (1997). September). A lognormal tied mixture model of pitch for prosody-based speaker recognition.Paper presented at the Fifth European Conference on Speech Communication and Technology, Rhodes, Greece . -
Uldall, E. (1960). Attitudinal meanings conveyed by intonation contours.Language and Speech, 3, 223–234. -
van Santen, J. P., Prud'hommeaux, E. T., Black, L. M., & Mitchell, M. (2010). Computational prosodic markers for autism.Autism, 14, 215–236. -
Vernon, T. W., Koegel, R. L., Dauterman, H., & Stolen, K. (2012). An early social engagement intervention for young children with autism and their parents.Journal of Autism and Developmental Disorders, 42, 2702–2717. -
Weider, S., & Greenspan, S. I. (2003). Climbing the symbolic ladder in the DIR model through floor time/interactive play.Autism, 7, 425–435. -
Wells, B., & MacFarlane, S. (1998). Prosody as an interactional resource: Turn-projection and overlap.Language and Speech, 41, 265–294. -
Young, S. J. 1993). The HTK Hidden Markov Model toolkit: Design and philosophy (Technical Report No. 153). Cambridge, United Kingdom: Department of Engineering, Cambridge University.