Motivation

(Monolingual) reference dictionaries, henceforth short dictionaries, are typically consulted by their users to learn more about the spelling, grammar, or meaning of a word of phrase. It might also be of interest to learn how a word has been used at some period of time and in which way the usage has changed since then (cf. Gloning, 2020). By some researchers and research fields, dictionaries are seen from another angle, i.e. as cultural artifacts. Dictionaries can take an ideological stance explicitly (e.g. the Wörterbuch der Deutschen Gegenwartssprache in its later volumes that convey a clear „Klassenstandpunkt“), they may convey a particular point of view (Weltsicht) as the „Damen Conversations Lexikon“, or the ideological background might be implicit (as e.g. in the Trübner, a multi-volume general dictionary of German that is said to be highly influenced by the idology of national socialism in the earlier volumes). Even more interesting are large dictionaries such as the Deutsches Wörterbuch of Jacob and Wilhelm Grimm, the compilation of which has seen many changes the political system(s) and the resp. ideological backgrounds. This line of research has been discussed in the context of the „Zentrum für historische Lexikographie“ (from now on: ZhistLex), a three-year-project funded by the BMBF that brought together lexicographers and lexicologists of historical dictionaries of German as well as web-platform providers of digitized dictionaries.

The individual decisions and style of dictionaries and their producers has hardly been in the focus of scientific investigation, one rare exception being the work of Luise Pusch (1983, „… der Duden als Trivialroman“) and Herbert Ernst Wiegand on the dictionary writing style of Wilhelm Grimm (1986). This probably has two reasons. First, such research crosses or integrates quite a few disciplines, at least lexicography, stylistics, socio-linguistics, cultural history. Second, and more important in the current context, the sheer mass of unstructured and therefore not easy to access data is a major obstacle for such investigations.

In the following I will list just a few questions concerning the lexicographical practice the answer of which might shed a light on the cultural bias of both academic and commercial lexicographical work:

  • Which words have (not) been added to the lemma list of a given dictionary? This typically involves cross-checking with word lists from corpora of the time-period and with lemma lists of comparable dictionaries.
  • Which works from the lexicographical corpus or „paper slips“ have been used frequently / less frequently by an individual lexicographer (if identifiable) or an editorial group?
  • Which „interesting“ (e.g. ideologically loaded) words have been used by whom in the definitions of words? Which words have been defined using such words?

As can be seen from these questions, one has to rely on digitzed and finely structured lexical resources, accompanied by rich and reliable meta data (e.g. who wrote a particular entry at what time). For dictionaries that include citations (Belegwörterbücher) a reliable list of sources is essential. Ideally, the sources that are used are linked to electronic versions of the full texts.

Objectives and Solutions

There are some necessary prerequisites for the kind of research that I described above, preliminary works that had been initiated by “ZHistLex” and that should be carried on by the Text+ data centers:

  • First of all, some of the resources (e.g. the Trübner, to name but one) must be retro-digitized and structured in such a way that individual information positions in the micro-structure of the articles can be identified and queried. This is probably not in the scope of the funding of Text+. Nevertheless, the task of proper curation of such works is in the scope of the work of the Text+ data centers.
  • Second, Metadata have to be provided, ideally down to the level of the individual article. Who wrote the article and when? A sub-corpus of articles written by a particular hand would enable investigations on its style.
  • Third, a common lemma list of all available dictionaries as a hub and a basis for comparison is needed.
  • Fourth, the lists of textual sources used by the dictionaries must be curated. Most of them have been compiled in the pre-digital area. As a consequence, there are in many cases erroneous and many references are ambiguous. The technical reports of the ZHistLex project provide some details, in particular with the case of the “Deutsches Wörterbuch”.
  • It would be very helpful to identify digitized versions of texts that have been used by the lexicographers for their work and to link the description of the source with such texts. This would enable us to study the larger context of a text snippet that is incorporated into a dictionary article. This would of course cross the borders between the lexical research area the area of collections. Some kind of synergy is called for.
  • Cross-dictionary search facilities are helpful for all necessary comparisons e.g. of definitions and citation selection between dictionaries (e.g. Deutsches Wörterbuch compared to the dictionary of Daniel Sanders). Advanced Federated Content Search facilities and or the provision of data sets as linked open data would be a step forward.

To conclude, an infrastructure like Text+ can and should assume our tasks of defining a research agenda or formulating a particular research question. Nevertheless, it can remove some of the obstacles that make it currently hard and cumbersome or even impossible to pursue such research questions further. Additionally, tools that enable the investigation of large chunks of (dictionary or corpus) data allow us to explore the material and get some inspiration for the refinement of a particular research questions of a research agenda at large.

Challenges

First, it has to be decided which resources must be available and should, from the point of view of the community, must be retro-digitized and structured. These efforts have to be pursued with third-party funding. The economic impact of such work and therefore commercial interest in it is admittedly low. It is rather a matter of the preservation of cultural heritage and documentation of our cultural background. Second, some of the dictionaries that describe diachronic stages of the German language are not yet finished. That makes a direct comparison difficult. To put it in perspective, most of the work(s) are being finished in this decade. Data centers that curate such resources however can do much to prepare the integration of these resources into networks of lexical resources as the project has proven.

Second, it is still not very clear how many works that formed the basis of lexicographical works are available in digital form. For some works, e.g. the Deutsches Rechtswörterbuch, the situation is excellent as the sources a digitized alongside the lexicographic work. In other cases, as with the Deutsches Wörterbuch, the situation is far from clear. It might be optimistic to infer from the fact that a particular work is frequently cited in the Deutsches Wörterbuch it is a salient cultural artefact and therefore worth the inclusion in such corpora as the Deutsches Textarchiv.

Review by community

From the point of view of an academic teacher and researcher, the infrastructure mentioned above allows for different research scenarios.

First, it allows to do metalexicographic research on the dictionaries that are currently available. On the one hand, they contribute to the history of science by describing lexicographical practices in different time frames. On the other hand they serve a very practical purpose, to know more about limitations and shortcomings of different dictionaries dependent on the aspects of their productions.

Second, the infrastructure mentioned above allows for more small scale work in the context of academic qualification, be it in term papers, in B.A. or M.A. theses, in dissertations, or in contributions to Festschriften or edited volumes.

Finally, knowledge about the peculiarities of different dictionaries allows for specific contributions in order to improve the shortcomings of the available dictionaries, e.g. the lacunae in respect of certain vocabulary sections like Nazi vocabulary, the vocabulary of sexuality or the “completeness” of vocabulary of specific thematic fields.

References

  • Gloning, Thomas. 2020. „gefälligst“ – Eine neue Bedeutungsgeschichte. Zugleich ein Beitrag zu narrativen Formaten in der historischen Semantik und Lexikographie. ZHistLex-Papiere. https://zhistlex.de/papiere/gloning_2020_gefaelligst_ZHistLex.pdf
  • Damaris Nübling Zur lexikografischen Inszenierung von Geschlecht Ein Streifzug durch die Einträge von Frau und Mann in neueren Wörterbüchern. DOI 10.1515/ZGL.2009.03
  • Pusch, Luise 1983: „Sie sah zu ihm auf wie zu einem Gott“. Das Duden-Bedeutungswörterbuch als Trivialroman. In: Der Sprachdienst 9.10, 135–142. Wiederabdruck 1984 in: Pusch, Luise (ed.): Das Deutsche als Männersprache. Frankfurt, 135–144.
  • Wiegand, Herbert Ernst: Der frühe Wörterbuchstil Jacob Grimms. In: Deutsche Sprache 4. 1986, 302-322. [Auch in: Nr. 360 = Kleine Schriften…, 684-703]
  • Damen Conversations Lexikon. 10 Bände, hrg. von Carl Herloßsohn. Leipzig 1834-1838 (http://www.zeno.org/damenconvlex-1834)
  • Jacob Grimm und Wilhelm Grimm: Deutsches Wörterbuch, Erstbearbeitung. 1852-1961 (https://www.dwds.de/wb/dwb/)
  • Sanders, Daniel: Wörterbuch der deutschen Sprache. Mit Belegen von Luther bis auf die Gegenwart. Leipzig 1865 (http://mdz-nbn-resolving.de/urn:nbn:de:bvb:12-bsb10808434-5)
  • Trübners deutsches Wörterbuch. 8 Bände. Hrsg. von Alfred Götze. Ab Bd. 5: Begr. von Alfred Götze. Hrsg. von Walther Mitzka. 8 Bde. Berlin, de Gruyter, 1954-1957