Building a corpus of Austrian online forum postings on reported COVID-19 cases


The COVID-19 pandemic has considerably affected our lives within the last two years. This is also reflected in an increasing number of corpus and discourse linguistic studies on various aspects of COVID-19 all over the world (e.g., Essam et al. 2021, Ng et al. 2021, Schweinberger et al. 2021).
The current project focuses on German in Austria (for previous studies on COVID-19 discourse in Austria see Bülow et al. in prep). It investigates the forum postings of the online version of the newspaper “Der Standard“, article Aktuelle Zahlen zum Coronavirus ‘Current numbers of coronavirus’ (URL, last accessed 2021/07/17). The first postings were created on March 16th, 2020, when the first COVID-19 regulation in Austria became effective and the first lockdown started. The forum has been active since then. So far, it was “restarted“ twice to for reasons of web server capacity: On September 16th, 2020, and on April 20th, 2021, the posting counter was set to zero and the previous postings were removed from the online forum.1 The forum is still active and currently contains 23.025 postings (2021/07/17, 10:05 a.m.). Thus, including the previous postings before September 16th, 2020, and April, 20th, 2021, the corpus consists of 217.174 postings so far but numbers of postings will certainly increase within the coming months.
Building up the corpus with a useful tagging of parts of speech and morphological categories will be a major challenge of the project. One option would be the CHILDES/CLAN program package (MacWhinney 2000) which was used in previous projects for building up several oral corpora (see e.g., Korecky-Kröll 2017).
The first focus of the project will be on corpus linguistics: A quantitative analysis of diminutives used for mitigation but also irony (e.g., ein bisserl ‘a bit-DIM’) as well as expressive compounds used for intensification  (e.g., sau+deppert ‘sow+stupid’) in relation to the respective COVID-19 numbers will be one major aim of the project. However, the corpus may also be analyzed from a more discourse-analytic perspective in the future.


  • Bülow, L., A. Diehr, D. Pfurtscheller & S. Thome. (in prep). Corona-Diskurse in und über Österreich. (Special Issue in Wiener Linguistische Gazette).
  • Essam, B.A. & M.S. Abdo (2021). How Do Arab Tweeters Perceive the COVID-19 Pandemic? Journal of Psycholinguistic Research 50, 507–521 (2021).
  • Korecky-Kröll, K. (2017). Kodierung und Analyse mit CHILDES: Erfahrungen mit kindersprachlichen Spontansprachkorpora und erste Arbeiten zu einem rein erwachsenensprachlichen Spontansprachkorpus. In: C. Resch & W.U. Dressler. eds. Digitale Methoden der Korpusforschung in Österreich. Vienna: Austrian Academy of Sciences Press, 85-113.
  • MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk. 3rd edition. Mahwah: NJ: Erlbaum.
  • Ng, R., T.Y.J. Chow & W. Yang (2021). Culture Linked to Increasing Ageism During COVID-19: Evidence From a 10-Billion-Word Corpus Across 20 Countries, The Journals of Gerontology: Series B, gbab057,
  • Schweinberger M, Haugh M, Hames S (2021). Analysing discourse around COVID-19 in the Australian Twittersphere: A real-time corpus-based analysis. Big Data & Society 8/1. doi:10.1177/20539517211021437

1 I am deeply grateful to the friendly and helpful team of for replying very quickly to my e-mails and for sending me the entire previous postings in CSV/XLSX format for research purposes.