Preserving Research Data with Text+

As part of the NFDI, Text+ serves as a hub for text- and language-based research data, offering the academic community the opportunity to deposit their data. These datasets are not only archived for the long term but are also made available for reuse in accordance with legal regulations. Access is open to all researchers with an academic affiliation, ensuring compliance with data protection and licensing requirements.

One of the Text+ Centers takes responsibility for archiving the data in a sustainable and interoperable manner using Text+ services and tools. The Text+ Centers provide comprehensive support for all aspects of research data submission. This includes data preparation, annotation, and transfer – along with the description of research data using Text+ specific metadata (see further information) – as well as selecting a suitable data center. You are welcome to use this form to provide an initial description of your data. Our helpdesk team will then work with you to finalize all necessary steps for publication.

Deposit Data with Text+


Data Depositing

Using the information provided in this simple form, our helpdesk team can initiate a dialogue with you, ultimately leading to published and reusable data. Fill in what you already know – everything else can be clarified in exchange with us.


Benefits of Archiving Data with Text+

Research data are a valuable asset, forming a core component of both individual scientific work and broader academic practice.

The submission and provision of research data in Text+ follow the FAIR principles and are managed by specialized, certified Text+ Centers. These centers ensure the sustainable preservation of research data — often beyond 10 years — and facilitate their reuse. Through the Registry, Text+ guarantees that research data remain visible within the Text+ community and beyond. Text+ also provides a secure and legally compliant way to access data through the Federated Content Search (FCS). Since research data is often interconnected, Text+, as part of the NFDI, ensures interdisciplinary linkage of datasets.

Connecting Your Own Archive to Text+

We also look forward to making your research data repository available in Text+ using standard interfaces. To that end, the sustainability of processes within such an archive is typically documented through certification, such as Nestor or Core Trust Seal. By integrating with the Text+ infrastructure, these archives can also benefit from Text+ software services.

If you are considering making your repository part of the Text+ network, feel free to contact us via the Helpdesk.

Go to Helpdesk

Further Information

Data ingest typically involves the following steps:

  1. Reviewing the data and preparing it for annotation (metadata):

    • What data formats are available? Can proprietary formats be replaced with open formats?
    • What is the subject of the data (which data center is best suited for it)?
    • What is the status of the data? Is the project completed, or is it ongoing? Is there versioning?
    • Who is the contact person representing those involved in data collection and analysis?
    • Who owns the data? Where do the rights lie? Are there any restrictions on data usage (data protection, licensing rights)?
    • Is there already a basic set of metadata available for ingestion, or a description of the research data (e.g., a working paper or publication)?
    • What are the costs and effort required for archiving?
  2. Data preparation and metadata enrichment:

    • Creating a well-structured file hierarchy, removing duplicates and temporary files, converting formats to those suitable for long-term archiving (LTA).
    • In collaboration with the receiving data center: Adding missing metadata and conversion into a machine-readable format using a data schema.
  3. Ingestion into the archive system (technical ingest):

    • Generating or supplementing technical metadata: structural metadata, copy protection checks, validation checks, checksums, etc.
  4. Long-term preservation preparations and maintenance:

    • Semantic preservation vs. bitstream preservation.
    • Updating metadata (e.g., contact person, changes in rights).
    • Format conversion (e.g., updating to the latest audio/video formats).

Ensuring the findability and reusability of research data requires its best possible description using metadata. In Text+, all research data should be annotated at least with the descriptive elements from DataCite. Ideally, your research data should be described using these 20 fields:

No.FieldDescriptionObligation
1.IdentifierUnique identifierM
2.CreatorAuthor of the resourceM
3.TitleTitle of the resourceM
4.PublisherPublisher of the resourceM
5.PublisherYearYear of publicationM
6.ResourceTypeType of resourceM
7.SubjectSubject areaR
8.ContributorContributors to the resourceR
9.DateDate of creationR
10.LanguageLanguage of the resourceO
11.AlternateIdentifierAlternative identifier, if availableO
12.RelatedIdentifierIdentifier of related resourcesR
13.SizeSize descriptionO
14.FormatFormats in which the resource is availableO
15.VersionVersioningO
16.RightsLicensing and copyright informationO
17.DescriptionPlain-text description of the resourceR
18.GeolocationGeographical location of the resourceR
19.FundingReferenceFunding referenceO
20.RelatedItemRelated resourcesO

Note that the first six fields are mandatory (M = Mandatory), some fields are recommended (R = Recommended), while others are optional (O = Optional).

Many Text+ partners annotate research data with additional characteristics that are often specific to the type of resource. Within the data domains in Text+, there is also a variety of descriptive systems, such as those used for annotating lexical resources. As a result, the consortium has established a common core of metadata descriptions for each data domain (Lexical Resources, Collections, Editions). For guidance, you can consult the corresponding documentation: