How to compare corpora: two methodological issues Václav Cvrček
Increasing number of corpus-based discourse studies start with keyword analysis (KWA) which was coined by Scott (Scott & Tribble 2006). The first step in KWA is identification of keywords which results from comparing the text/corpus under examination against the backdrop of referential corpus. In this presentation, I will focus on two methodological questions closely related to the process of keyword identification:
- What is the appropriate metric which can be used to measure keyness? It has been pointed out several times (Hofland & Johansson 1982; Scott 2010; Garielatos & Marchi 2012) that test statistic (log-likelihood or chi2) may provide misleading characteristics and that effect size estimator is more adequate.
- What is the role of the reference corpus, the impact of its composition and size on the results? I would argue that the reference corpus helps reconstructing a model reader of a text and therefore it has to be taken into account in results interpretation.
Both methodological issues will be demonstrated on two pilot studies: analysis of presidential New Year’s addresses (Fidler & Cvrček 2015) and analysis of academic texts (Cvrček & Fidler 2019).
Cvrček, V. and M. Fidler. 2019. “Up close and personal vs. birds-eye view“ of discourse: a corpus study of perspective using Czech data. ICLC15 – International Congitive Linguistics Conference, Nishinomya. Japan.
Fidler, M. and V. Cvrček. 2015. A Data-Driven Analysis of Reader Viewpoints: Reconstructing the Historical Reader Using Keyword Analysis, Journal of Slavic Linguistics 23(2), pp. 197–239.
Gabrielatos, C. and A. Marchi. 2012. Keyness: Appropriate metrics and practical issues. CADS International Conference 2012, University of Bologna, Italy.
Hofland, K. and S. Johansson. 1982. Word frequencies in British and American English. Bergen: Norwegian computing centre for the Humanities.
Scott, M. 2010. Problems in investigating keyness, or clearing the undergrowth and marking out trails… In: M. Bondi and M. Scott (eds.) Keyness in Texts, pp. 43–58. Amsterdam / Philadelphia: John Benjamins.
Scott, M. and C. Tribble. 2006. Textual Patterns: Key words and corpus analysis in language education. Philadelphia: John Benjamins.