No AccessJournal of Speech, Language, and Hearing ResearchResearch Article17 Mar 2021

Evaluating the Language ENvironment Analysis System for Korean


    The algorithm of the Language ENvironment Analysis (LENA) system for calculating language environment measures was trained on American English; thus, its validity with other languages cannot be assumed. This article evaluates the accuracy of the LENA system applied to Korean.


    We sampled sixty 5-min recording clips involving 38 key children aged 7–18 months from a larger data set. We establish the identification error rate, precision, and recall of LENA classification compared to human coders. We then examine the correlation between standard LENA measures of adult word count, child vocalization count, and conversational turn count and human counts of the same measures.


    Our identification error rate (64% or 67%), including false alarm, confusion, and misses, was similar to the rate found in Cristia, Lavechin, et al. (2020). The correlation between LENA and human counts for adult word count (r = .78 or .79) was similar to that found in the other studies, but the same measure for child vocalization count (r = .34–.47) was lower than the value in Cristia, Lavechin, et al., though it fell within ranges found in other non-European languages. The correlation between LENA and human conversational turn count was not high (r = .36–.47), similar to the findings in other studies.


    LENA technology is similarly reliable for Korean language environments as it is for other non-English language environments. Factors affecting the accuracy of diarization include speakers' pitch, duration of utterances, age, and the presence of noise and electronic sounds.


    Additional Resources