(Terms for) Mulitlingualism in the Habsburg Monarchy Building small-size historical newspaper corpora based on keywords

Abstract

Societal and individual multilingualism undoubtably shaped the political discourse throughout the late Habsburg monarchy (1867–1918). However, the linguistic, societal and legal conditions for multilingualism as well as the status of a single named language considerably varied from crownland to crownland. Czech, e.g., was one of two or even official languages (landesübliche Sprachen) besides German (and Polish) in Bohemia, Moravia, and Austrian Silesia. In all three, proponents of these languages (as indices for nationalities) struggled for linguistic rights and equality, both eventually to ensure political and economic power. In all three, political solutions (or: attempts for solutions) differed due to the local conditions. At the same time, Lower Austria’s parliament (Landtag) sought to once and for all ensure that German remained the only official language in the crownland, while Czech was striving for recognition.
Against this background, I assume that discourse on multilingualism in German newspapers of the late Habsburg monarchy is comparably diverse. However, despite the large amount of digitized newspapers in the AustriaN Newspapers Online-Platform (ANNO, anno.onb.ac.at, 24 million pages) by the Austrian National Library there are no linguistically annotated corpora available that would facilitate research into this topic. In order to help close this gap, I created two test corpora and developed a workflow of how to process the online available OCR-scans into XML-files, which can be automatically tokenized, lemmatized and annotated by algorithms that were trained on contemporary German in Sketch Engine. This process involves manual correction as well as orthographic and morphologic normalization.
The test corpora consist of text snippets of max. 11 sentences from a single paragraph including a certain keyword, namely (1) mehrsprachig* ‘multilingual’, zweisprachig* ‘bilingual’, or vielsprachig* ‘polyglot’ and (2) utraquis*, a certain form of institutional bilingualism (including derivations from these words as indicated by the asteriskes). Corpus (1) consists of texts from a single newspaper, namely the Wiener Zeitung and has a size of 24,742 tokens, while corpus (2) contains texts from all over the Austrian part of the Habsburg monarchy and is 26,808 tokens large.
Based on these two corpora, I present first results on how these keywords were used in the newspaper discourse of the late Habsburg monarchy. I show that Vielsprachigkeit ‘polyglossia’ occurs in texts that represent the monarchy as a whole and the Habsburg family as its unifying figures, whereas Zweisprachigkeit ‘bilingualism’ is linked to the political conflicts and turns into a commonplace that is used in each and every, even not primarily linguistic conflict scenario in the beginning of the 20th century. The term Mehrsprachigkeit ‘multilingualism seems to be comparably neutral. Based on the corpus on Utraquismus ‘institutional bilingualism’, I show how the linguistic meaning of this keyword developed from its original one around 1850 in the Bohemian context and then spread all over the monarchy. In the lands of the Bohemian crown it continued to be used in several contexts, while in the other parts it was limited to the educational domain.