Large Language Models (LLMs) and Artificial Intelligence

Against the backdrop of the rapid advancement of Large Language Models (LLMs), Text+ aims to explore the potential applications of generative language models, artificial intelligence, and transformer models in science. The consortium seeks to leverage its extensive language and text data collections, as well as the powerful computing centers of its partner institutions.

Text+ aims to develop applications and services for scientific communities that utilize LLMs. Additionally, Text+ centers intend to prepare their language and text resources in a targeted manner to train language models effectively. Text+ will offer models (such as fine-tuning pretrained models or Retrieval-Augmented Generation, RAG) for specific tasks, as well as resources—data and computing power—for researchers to fine-tune models. Furthermore, Text+ aims to explore how material with (copyright) access restrictions can be integrated into LLMs, whether and how LLMs can be trained with derived text formats, and for which research questions LLMs are suitable.

The following specific use cases illustrate the potential offered by LLMs and Text+ together:

  • Data Preprocessing using Named Entity Recognition (NER) as an example: LLMs assist in data preprocessing for later application of a specially trained NER model.
  • Runtime Environment for NLP tools: Classifiers (e.g., from MONAPipe in Text+) are provided in containers via API and equipped with GPU nodes for effective deep learning model utilization.
  • Generation of Example Sentences or context: LLMs will support the enrichment of entries in the lexical-semantic word network GermaNet.
  • Query Generation to support search in Federated Content Search (FCS) by Text+: An LLM-based chatbot will aid in exploring the FCS and help translate natural language queries into syntactically correct search queries for the FCS.
  • Entity Linking: LLMs assist in linking named entities in full texts with authority files like the GND or knowledge bases like Wikidata.
  • Historical Normalizations: LLMs trained with data from historical collections adjust varying spellings from different eras.