Collections

The Collections data domain encompasses language and text-based collections of written, spoken, or signed language and texts that were written based on scientific criteria. This includes text collections and corpora, mono- and multimodal recordings of language, as well as language- and text-related experimental or measurement data. The collections are provided in the distributed infrastructure of Text+ by various certified data centres, each with their own specialisations, and can be searched within Text+ via the registry as well as the federated content search.

For knowledge transfer and assistance with questions related to Collections and their integration, the Text+ Helpdesk is the primary contact point. It also serves as a contact address for projects interested in integrating research data into the Text+ infrastructure.

The data domain pays particular attention to legal and ethical issues, for which it acts as a point of contact within the consortium and beyond. For example, derived text formats that enable research into copyright-protected resources without infringing copyright are a particular focus. Other priorities include the community-based further development of software services, large language models and AI as well as issues relating to topics such as research data management and standardisation.

Contact Persons

The Collections data domain is coordinated by the German National Library. Its spokesperson is Philippe Genêt.