Title: A quantitative perspective on language variation: methods, measures and causal mechanisms

Speaker: Katharina Ehret, University of Freiburg

Time: May 26, 2020 01:00 PM Amsterdam, Berlin, Rome, Stockholm, Vienna

This talk sketches my recent work on quantitative methods for measuring language variation within and across languages, and outlines how variants of these approaches can be sed to explore the link between language adaptation/evolution, language variation, and language complexity.

(1) ​ Multivariate logistic regression analysis is used to assess the constraints governing the English genitive variation (e.g. ​ Tom’s car vs. ​ the car of Tom ​ ) focusing on the rarely explored factor “rhythm” in a historical English dataset. Generally, language users should prefer the more rhythmically optimal genitive constructions to the less rhythmically optimal variant. Yet, it turns out that rhythm is only a minor player in  this dataset whose effect runs contrary to expectations as more rhythmic genitives are overall not preferred (Ehret et al. 2014).

(2) ​ Kolmogorov complexity ​ is approximated to measure the overall, morphological and syntactic complexity of the Gospel of Mark in some ten varieties of English and several other Indo-European and Finno-Ugric languages. The novel approach I take here draws on compression algorithms to assess language complexity by measuring the information content in text samples. Texts that can be compressed more efficiently are comparatively more complex. The results are in line with more conservative approaches and show, among other interesting findings, that Hungarian is more morphologically complex than English (Ehret 2017).

(3) ​ Multi-dimensional analysis ​ , i.e. factor analysis, reveals that online news comments are not like face-to-face conversation – a common assumption among both journalists and researchers – but are clearly a  written register. As a result of the social (inter)actions of online commenters, their communicative needs and the situational constraints of online transmission, online news comments exhibit unique linguistic properties combining informational, involved and evaluative features (Ehret & Taboada 2019).

Finally, I will outline future directions in my research, exploring how variants of these methodologies can be combined to disentangle the complex interactions between extra-linguistic triggers (e.g. non-native  acquisition, population size) related to variation in language complexity. It will also be shown how such triggers can be tested as causal mechanisms for variation in language complexity. I will conclude by  highlighting the relevance of exploring language variation and complexity for understanding language in general, and its adaptation and evolution in particular.

Ehret, Katharina & Maite Taboada (2020). “​ Are online news comments like face-to-face conversation? A multi-dimensional analysis of an emerging register​ “. ​ Register Studies ​ , 2 (1): 1-36.

Ehret, Katharina (2017). “​ An information-theoretic approach to language complexity: variation in naturalistic corpora​ “. FreiDok plus, Universität Freiburg.

Ehret, Katharina, Christoph Wolk & Benedikt Szmrecsanyi (2014). “​ Quirky quadratures: on rhythm and weight as constraints on genitive variation in an unconventional dataset​ “. ​ English Language and Linguistics, ​ 18 (2): 263-303.