University of Duisburg-Essen (UniDUE)

Text+ center: PolMine

Type of center: data center/competence center

The contribution of the PolMine Project as an entity of UniDUE to Text+ comprises collections of spoken language as contained in manuscripts and minutes of political discourse. As the language resources created in Duisburg are linguistically annotated and adhere to the guidelines of the Text Encoding Initiative (TEI), they are also highly relevant for linguistic and historical research. Workflows for the reproducible preparation of data and an environment for corpus analysis are part of the controbution of the PolMine Project. The data is currently being disseminated via various long-term repositories and via the project’s web environment.

Highlights of provided data and services

  • GermaParl - Corpus of Plenary Protocols: digital collection of all parliamentary debates in the German Bundestag from 1949 to 2024. As part of quality evaluations and expansion of coverage, updates are made every six months in a two-stage process: Initially for registered users, then publication without restrictions. Data formats: TEI-XML and CWB.
  • The reproducible workflow and the purpose-developed toolchain, including the latest annotation tools, are available for reuse.
  • R package polmineR provides an analysis environment for CWB corpora in CWB, which enabling the interactive combination of qualitative and quantitative analysis steps.
  • The GermaParl corpus will soon be released in accordance with the ParlaMint standard, thus firmly anchoring GermaParl in the European family of parliamentary corpora.

Third-party data reception

As a competence centre for parliamentary language data, the PolMine project provides expertise and tools for the preparation of corpora with manuscripts and transcripts of the political discourse and provides support for corresponding projects.

Contact

Contact for Text+: Andreas Blätte