Large Language Models (LLMs) and Artificial Intelligence

Against the backdrop of the rapid advancement of Large Language Models (LLMs), Text+ aims to explore the potential applications of generative language models, artificial intelligence, and transformer models in science. The consortium seeks to leverage its extensive language and text data collections, as well as the powerful computing centers of its partner institutions.

Text+ aims to develop applications and services for scientific communities that utilize LLMs. Additionally, Text+ centers intend to prepare their language and text resources in a targeted manner to train language models effectively. Text+ will offer models (such as fine-tuning pretrained models or Retrieval-Augmented Generation, RAG) for specific tasks, as well as resources—data and computing power—for researchers to fine-tune models. Furthermore, Text+ aims to explore how material with (copyright) access restrictions can be integrated into LLMs, whether and how LLMs can be trained with derived text formats, and for which research questions LLMs are suitable.

Application scenarios in Text+

  • Data Preprocessing using Named Entity Recognition (NER) as an example: LLMs assist in data preprocessing for later application of a specially trained NER model.
  • Generation of Example Sentences or context: LLMs will support the enrichment of entries in the lexical-semantic word network GermaNet.
  • Runtime Environment for NLP tools: Classifiers are provided in containers via API and equipped with GPU nodes for effective deep learning model utilization.
  • Query Generation to support search in Federated Content Search (FCS) by Text+: An LLM-based chatbot will aid in exploring the FCS and help translate natural language queries into syntactically correct search queries for the FCS.
  • Entity Linking: LLMs assist in linking named entities in full texts with authority files like the GND or knowledge bases like Wikidata.
  • Historical Normalizations: LLMs trained with data from historical collections adjust varying spellings from different eras.
  • MONAPipe, APIs for components: neuronal models (e.g. speech reproduction, event recognition) are made available as an API.

General application scenarios

In the following, some use cases are outlined that show how LLMs can generally be used as powerful tools in the text- and language-based humanities.

LLMs can be used for content analysis to systematically analyse large volumes of text and identify themes, motifs or stylistic features. For example, literary works, historical documents or philosophical texts can be automatically analysed for recurring themes, sentiments or linguistic patterns. LLMs can track historical changes in concepts and ideas in extensive textual databases or analyse discourses in different time periods and cultures.
LLMs can be used to automatically tag texts with relevant metadata, such as keywords, abstracts or classifications. This facilitates the indexing and reuse of texts in repositories. They can help to sort and organise data in large repositories according to thematic, geographical or temporal criteria, making it easier for researchers to access and use.
For interdisciplinary and international research projects, LLMs can be used to machine-translate texts from different languages, making access to international research results and sources much easier. LLMs could be used to identify and analyse regional language variants, historical language levels or dialectal differences.
LLMs can be used to extract key research questions and hypotheses from large amounts of scientific literature that can serve as the basis for new studies (automated literature reviews). By analysing existing research literature, LLMs can identify areas that have been little studied and thus suggest new research topics.
In cases of close links between texts and visual materials (e.g. manuscripts etc.), LLMs can be used in combination with image analysis models to analyse such multimodal data, e.g. by analysing image descriptions or linking texts with corresponding visual representations.
LLMs can help to transcribe and annotate historical texts and convert them into digital formats, which facilitates the creation and analysis of digital editions. By analysing historical text corpora, LLMs can contribute to tracing the course of discourse and researching the development of ideas and concepts in history.
LLMs can be used to generate first drafts or summaries of academic papers, which can be particularly helpful for overcoming writer’s block or processing large amounts of data. They can be used to suggest appropriate citations and literature sources based on the context of the text, making the writing process more efficient.