Abstract
Purpose
Bayesian multilevel models are increasingly used to overcome the limitations of frequentist approaches in the analysis of complex structured data. This tutorial introduces Bayesian multilevel modeling for the specific analysis of speech data, using the brms package developed in R.
Method
In this tutorial, we provide a practical introduction to Bayesian multilevel modeling by reanalyzing a phonetic data set containing formant (F1 and F2) values for 5 vowels of standard Indonesian (ISO 639-3:ind), as spoken by 8 speakers (4 females and 4 males), with several repetitions of each vowel.
Results
We first give an introductory overview of the Bayesian framework and multilevel modeling. We then show how Bayesian multilevel models can be fitted using the probabilistic programming language Stan and the R package brms, which provides an intuitive formula syntax.
Conclusions
Through this tutorial, we demonstrate some of the advantages of the Bayesian framework for statistical modeling and provide a detailed case study, with complete source code for full reproducibility of the analyses (https://osf.io/dpzcb/).
Supplemental Material
References
-
Akaike, H. (1974). A new look at the statistical model identification.IEEE Transactions on Automatic Control, 19(6), 716–723. https://doi.org/10.1109/tac.1974.1100705 -
Aust, F., & Barth, M. (2017). papaja: Create APA manuscripts with R Markdown. Retrieved from https://github.com/crsh/papaja -
Bakan, D. (1966). The test of significance in psychological research.Psychological Bulletin, 66(6), 423–437. https://doi.org/10.1037/h0020412 -
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal.Journal of Memory and Language, 68(3), 255–278. https://doi.org/10.1016/j.jml.2012.11.001 -
Bates, D., Kliegl, R., Vasishth, S., & Baayen, R. H. (2015). Parsimonious mixed models. Retrieved from https://arxiv.org/pdf/1506.04967.pdf -
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4.Journal of Statistical Software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01 -
Bürkner, P.-C. (2017a). Advanced bayesian multilevel modeling with the R package brms. Retrieved from https://arxiv.org/pdf/1705.11123 -
Bürkner, P.-C. (2017b). brms: An R package for bayesian multilevel models using Stan.Journal of Statistical Software, 80(1), 1–28. https://doi.org/10.18637/jss.v080.i01 -
Bürkner, P.-C., Williams, D. R., Simmons, T. C., & Woolley, J. D. (2017). Intranasal oxytocin may improve high-level social cognition in schizophrenia, but not social cognition or neurocognition in general: A multilevel Bayesian meta-analysis.Schizophrenia Bulletin, 43(6), 1291–1303. https://doi.org/10.1093/schbul/sbx053 -
Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, NY: Routledge. -
Cumming, G. (2014). The new statistics: Why and how.Psychological Science, 25(1), 7–29. https://doi.org/10.1177/0956797613504966 -
Dienes, Z. (2011). Bayesian versus orthodox statistics: Which side are you on.Perspectives on Psychological Science, 6(3), 274–290. https://doi.org/10.1177/1745691611406920 -
Eager, C., & Roy, J. (2017). Mixed effects models are sometimes terrible. Retrieved from https://arxiv.org/pdf/1701.04858.pdf -
Garnier, S. (2017). Viridis: Default color maps from ‘matplotlib.’. Retrieved from https://CRAN.R-project.org/package=viridis -
Gelman, A. (2005). Analysis of variance—Why it is more important than ever.The Annals of Statistics, 33(1), 1–53. https://doi.org/10.1214/009053604000001048 -
Gelman, A. (2006). Prior distributions for variance parameter in hierarchical models.Bayesian Analysis, 1(3), 515–534. -
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis (3rd ed.). Boca Raton, FL: CRC Press. -
Gelman, A., Goodrich, B., Gabry, J., & Ali, I. (2017). R-squared for Bayesian regression models. Retrieved from https://github.com/jgabry/bayes_R2/blob/master/bayes_R2.pdf -
Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. New York, NY: Cambridge University Press. -
Gelman, A., Hill, J., & Yajima, M. (2012). Why we (usually) don't have to worry about multiple comparisons.Journal of Research on Educational Effectiveness, 5, 189–211. https://doi.org/10.1080/19345747.2011.618213 -
Gelman, A., Hwang, J., & Vehtari, A. (2014). Understanding predictive information criteria for Bayesian models.Statistics and Computing, 24(6), 997–1016. https://doi.org/10.1007/s11222-013-9416-2 -
Gelman, A., & Pardoe, I. (2006). Bayesian measures of explained variance and pooling in multilevel (hierarchical) models.Technometrics, 48(2), 241–251. https://doi.org/10.1198/004017005000000517 -
Gelman, A., & Rubin, D. B. (1992). Inference from iterative simuation using multiple sequences.Statistical Science, 7(4), 457–472. https://doi.org/10.1214/ss/1177011136 -
Gigerenzer, G., Krauss, S., & Vitouch, O. (2004). The null ritual: What you always wanted to know about significance testing but were afraid to ask.InD. Kaplan (Ed.), The Sage handbook of methodology for the social sciences (pp. 391–408). Thousand Oaks, CA: Sage. https://doi.org/10.4135/9781412986311.n21 -
Hedges, L. V. (2007). Effect sizes in cluster-randomized designs.Journal of Educational and Behavioral Statistics, 32(4), 341–370. https://doi.org/10.3102/1076998606298043 -
Hoekstra, R., Morey, R. D., Rouder, J. N., & Wagenmakers, E.-J. (2014). Robust misinterpretation of confidence intervals.Psychonomic Bulletin & Review, 21(5), 1157–1164. https://doi.org/10.3758/s13423-013-0572-3 -
Janssen, D. P. (2012). Twice random, once mixed: Applying mixed models to simultaneously analyze random effects of language and participants.Behavior Research Methods, 44(1), 232–247. https://doi.org/10.3758/s13428-011-0145-1 -
Judd, C. M., Westfall, J., & Kenny, D. A. (2017). Experiments with more than one random factor: Designs, analytic models, and statistical power.Annual Review of Psychology, 68, 601–625. https://doi.org/10.1146/annurev-psych-122414-033702 -
Kline, R. (2004). What's wrong with statistical tests—And where we go from here.InR. B. Kline (Ed.), Beyond significance testing: Reforming data analysis methods in behavioral research (pp. 61–91). Washington, DC: American Psychological Associationhttps://doi.org/10.1037/10693-003 -
Kruschke, J. K. (2015). Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan (2nd ed.). Burlington, MA: Academic Press/Elsevier. -
Kruschke, J. K., & Liddell, T. M. (2018a). Bayesian data analysis for newcomers.Psychonomic Bulletin & Review, 25, 155–177. https://doi.org/10.3758/s13423-017-1272-1 -
Kruschke, J. K., & Liddell, T. M. (2018b). The Bayesian new statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective.Psychonomic Bulletin & Review, 25, 178–206. https://doi.org/10.3758/s13423-016-1221-4 -
Kruschke, J. K., & Meredith, M. (2018). BEST: Bayesian estimation supersedes the t-test. Retrieved from https://CRAN.R-project.org/package=BEST -
Lambdin, C. (2012). Significance tests as sorcery: Science is empirical—Significance tests are not.Theory & Psychology, 22(1), 67–90. https://doi.org/10.1177/0959354311429854 -
Lewandowski, D., Kurowicka, D., & Joe, H. (2009). Generating random correlation matrices based on vines and extended onion method.Journal of Multivariate Analysis, 100(9), 1989–2001. https://doi.org/10.1016/j.jmva.2009.04.008 -
Marsman, M., Waldorp, L., Dablander, F., & Wagenmakers, E.-J. (2017). Bayesian estimation of explained variance in ANOVA designs. Retrieved from http://maartenmarsman.com/wp-content/uploads/2017/04/MarsmanEtAl_R2.pdf -
McCloy, D. R. (2014). Phonetic effects of morphological structure in Indonesian vowel reduction.Proceedings of Meetings on Acoustics, 12, 060009. https://doi.org/10.1121/1.4870068 -
McCloy, D. R. (2016).phonR: Tools for phoneticians and phonologists [Computer software] . Available from https://rdrr.io/cran/phonR/ -
McElreath, R. (2016). Statistical rethinking. Boca Raton, FL: Chapman and Hall/CRC Press. -
Morey, R. D., Hoekstra, R., Rouder, J. N., Lee, M. D., & Wagenmakers, E.-J. (2015). The fallacy of placing confidence in confidence intervals.Psychonomic Bulletin & Review, 23, 103–123. https://doi.org/10.3758/s13423-015-0947-8 -
Murdoch, D., & Chow, E. D. (2013). ellipse: Functions for drawing ellipses and ellipse-like confidence regions. Retrieved from https://CRAN.R-project.org/package=ellipse -
Nicenboim, B., & Vasishth, S. (2016). Statistical methods for linguistic research: Foundational ideas—Part II.Language and Linguistics Compass, 10(11), 591–613. https://doi.org/10.1111/lnc3.12207 -
Polson, N. G., & Scott, J. G. (2012). On the half-Cauchy prior for a global scale parameter.Bayesian Analysis, 7(4), 887–902. https://doi.org/10.1214/12-BA730 - R Core Team. (2018). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/
-
Robinson, D. (2017). broom: Convert statistical analysis objects into tidy data frames. Retrieved from https://CRAN.R-project.org/package=broom -
Scott, J. G., & Berger, J. O. (2010). Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem.The Annals of Statistics, 38(5), 2587–2619. https://doi.org/10.1214/10-AOS792 -
Sorensen, T., Hohenstein, S., & Vasishth, S. (2016). Bayesian linear mixed models using Stan: A tutorial for psychologists, linguists, and cognitive scientists.The Quantitative Methods for Psychology, 12(3), 175–200. https://doi.org/10.20982/tqmp.12.3.p175 - Stan Development Team. (2016). Stan modeling language users guide and reference manual. Retrieved from http://mc-stan.org
-
Trafimow, D., Amrhein, V., Areshenkoff, C. N., Barrera-Causil, C. J., Beh, E. J., Bilgiç, Y. K., … Marmolejo-Ramos, F. (2018). Manipulating the alpha level cannot cure significance testing.Frontiers in Psychology, 9, 699. https://doi.org/10.3389/fpsyg.2018.00699 -
Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC.Statistics and Computing, 27(5), 1413–1432. https://doi.org/10.1007/s11222-016-9696-4 -
Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory.Journal of Machine Learning Research, 11, 3571–3594. -
Watt, D., & Fabricius, A. (2002). Evaluation of a technique for improving the mapping of multiple speakers' vowel spaces in the F1∼F2 plane.Leeds Working Papers in Linguistics and Phonetics, 9(9), 159–173. -
Wickham, H. (2009). ggplot2: Elegant graphics for data analysis. New York, NY: Springer-Verlag. Retrieved from http://ggplot2.org -
Wickham, H. (2017). tidyverse: Easily install and load ‘tidyverse’ packages. Retrieved from https://CRAN.R-project.org/package=tidyverse -
Wilke, C. O. (2017). ggridges: Ridgeline plots in ‘ggplot2.’Retrieved from https://CRAN.R-project.org/package=ggridges -
Williams, D. R., & Bürkner, P.-C. (2017). Psychoneuroendocrinology effects of intranasal oxytocin on symptoms of schizophrenia: A multivariate Bayesian meta-analysis.Psychoneuroendocrinology, 75, 141–151. https://doi.org/10.1016/j.psyneuen.2016.10.013 -
Xie, Y. (2015). Dynamic documents with R and knitr (2nd ed.). Boca Raton, FL: Chapman and Hall/CRC Press. Retrieved from https://yihui.name/knitr/