Vocabulary content rather than size predicts sex/gender before the age of three years

Mikkel Wallentin, Department of Linguistics, Cognitive Science and Semiotics, Aarhus University / Center of Functionally Integrative Neuroscience, Aarhus University Hospital / Interacting Minds Centre, Aarhus University
Fabio Trecca, Department of Linguistics, Cognitive Science and Semiotics, Aarhus University / TrygFonden’s Centre for Child Research, Aarhus University

Does sex/gender matter for language acquisition? Small female advantages in vocabulary size are well-documented. Girls, on average, begin to speak slightly earlier than boys (Bleses, et al., 2008; Wallentin, 2020), and small sex/gender differences in mean vocabulary size have been shown consistently across languages, with girls outperforming boys on measures of receptive and productive vocabulary from a young age (Berglund, et al., 2005; Bleses, et al., 2008; Fenson, et al., 1994; Frank, et al., 2021; Simonsen, et al., 2013). In this study, however, we show that children’s early vocabulary composition is a significantly better predictor of sex/gender than size.

We conducted classification analysis on word production from children (12-36 months, n =39,553) acquiring 26 different languages, using data from the MacArthur-Bates Communicative Development Inventories Words and Sentences (MB-CDI: WS)(Braginsky, et al., 2020; Fenson, et al., 2007; Frank, et al., 2017), available from Wordbank (http://wordbank.stanford.edu).

Children’s sex/gender was classified above chance level in 22 out of 26 languages. Classification accuracy was significantly higher than for models based on vocabulary size and increased as a function of sample size. Classification accuracy also increased as a function of age and peaked at 30 months, reaching accuracy levels observed in studies of adult word use.

A sex/gender score was computed for each word in a language based on classification coefficients. The higher the score, the more predictive a word is of sex/gender. We used semantic/grammatical category tags from the Wordbank database to predict the sex/gender scores for individual words. Within languages, several categories were found to predict the sex/gender score. In 24 out of 26 language samples, the category Clothing significantly predicted sex/gender score with a negative parameter estimate, indicating the category being used more by girls. In 23 out of 26 language samples, the category Vehicles significantly predicted sex/gender score with a positive parameter estimate, indicating the category being produced more by boys.

Across languages, a mixed-effects analysis with category as fixed-effects and language sample as random effects showed that sex/gender scores were significantly predicted by the 3 categories Animals, Body parts, Clothing, Connecting words, Games/Routines, Toys and Pronouns, all of which were significantly more likely to be produced by girls; and Outside/Places and Vehicles, which were more likely to be produced by boys.

These differences in vocabulary are indicative of biocultural differences in the lifeworld of children and may themselves cause further differences in development.

 

References

Berglund, E., Eriksson, M., & Westerlund, M. (2005). Communicative skills in relation to gender, birth order, childcare and socioeconomic status in 18-month-old children. Scandinavian Journal of Psychology, 46, 485-491, 10.1111/j.1467- 9450.2005.00480.x

Bleses, D., Vach, W., Slott, M., Wehberg, S., Thomsen, P., Madsen, T. O., & Basbøll, H. (2008). The Danish Communicative Developmental Inventories: validity and main developmental trends. Journal of Child Language, 35, 1-19, 10.1017/S0305000907008574

Braginsky, M., Yurovsky, D., Frank, M., & Kellier, D. (2020). Wordbankr: Accessing the Wordbank Database. . In. R package version 0.3.1., https://CRAN.Rproject.org/package=wordbankr

Fenson, L., Dale, P. S., Reznick, J. S., Bates, E., Thal, D. J., & Pethick, S. J. (1994). Variability in early communicative development. Monographs of the Society for Research in Child Development, 59, 1-173; discussion 174-185

Fenson, L., Marchman, V. A., Thal, D., Dale, P., Reznick, J. S., & Bates, E. (2007). MacArthur-Bates Communicative Development Inventories: User’s Guide and Technical Manual. (2nd edition ed.). Baltimore, MD: Brookes Publishing Co.

Frank, M. C., Braginsky, M., Yurovsky, D., & Marchman, V. A. (2017). Wordbank: an open repository for developmental vocabulary data. Journal of Child Language, 44, 677- 694, 10.1017/S0305000916000209, https://www.cambridge.org/core/article/wordbank-an-open-repository-fordevelopmental-vocabulary-data/977D930531B5318CA976CD8582D9F401

Frank, M. C., Braginsky, M., Yurovsky, D., & Marchman, V. A. (2021). Variability and consistency in early language learning: The Wordbank project. Cambridge MA: MIT Press

Simonsen, H. G., Kristoffersen, K. E., Bleses, D., Wehberg, S., & Jørgensen, R. N. (2013). The Norwegian Communicative Development Inventories: Reliability, main developmental trends and gender differences. First Language, 34, 3-23, https://doi.org/10.1177/0142723713510997

Wallentin, M. (2020). Gender differences in language are small but matter for disorders. In R. Lanzenberger, G. S. Kranz & I. Savic (Eds.), Handbook of Clinical Neurology (Vol. 175, pp. 81-102): Elsevier, https://doi.org/10.1016/B978-0-444-64123-6.00007-2