Saarland University (SLUni)

Text+ center: CLARIND-UdS: Language Resource Repository at Saarland University

Type of center: data center/competence center

Research and teaching at the Department of Language Science and Technology of Saarland University (UdS) covers many linguistic disciplines, including computational linguistics, psycholinguistics, phonetics, language technology, corpus linguistics and translation studies.

In the context of Text+ Saarland University is part of the data domain Collections. The data center, CLARIND-UdS, specializes in register corpora, multilingual corpora and translation corpora. More than 100 data resources have already been archived in the language resource repository. As an example, the portfolio includes diachronic corpora such as the Royal Society Corpus (RSC) and the Old Bailey Corpus (OBC).

Highlights of provided data and services

  • Royal Society Corpus (RSC): contains annotated scientific articles from 1665 to 1920, which were published in the Philosophical Transactions and Proceedings of the Royal Society of London
  • Old Bailey Corpus (OBC): documents spoken English from two centuries (1720 to 1913) based on trial transcripts from the Central Criminal Court in London
  • EuroParl-UdS Corpus: based on parliamentary debates of the European Parliament between 1999 and 2017; English, German and Spanish texts were enriched with metadata on texts and speakers
  • EPIC-UdS Corpus: trilingual parallel and comparable interpreting corpus of speeches given in the European Parliament between 2008 and 2013

Third-party data reception

CLARIND-UdS accepts data that match its existing inventory. These are primarily non-German and multilingual corpora. As an example, the repository hosts an annotated corpus of Kyrgyz texts (Manas-UdS). Furthermore, the GermaParl corpus was made available on the corpus platform of UdS.

Contact

Contact for Text+: Elke Teich (Admin.) and Jörg Knappen (Techn.)