Preserving Research Data with Text+

As part of the NFDI, Text+ serves as a hub for text- and language-based research data. The Text+ Centers offering the academic community the opportunity to deposit their data. These datasets are not only archived for the long term but are also made available for reuse in accordance with legal regulations. Access is open to all researchers with an academic affiliation.

Deposit Data with Text+

The Text+ Centers provide comprehensive support for all aspects of research data submission. This includes selecting a suitable data center as well as data preparation, annotation, and transfer, including the description of research data using Text+ specific metadata (see further information) as well as ensuring compliance with data protection and licensing requirements.

You are welcome to use this form to provide an initial description of your data. Our helpdesk team will then work with you to finalize all necessary steps for publication.

Do you want to get a more precise overview of your data first? You can use the Text+ RDMO question catalogue (access via GRO.plan of the SUB Göttingen) to create a data management plan (DMP). Once you have completed the DMP, you can contact the Text+ Helpdesk for further advice.

Data Depositing

Our helpdesk team will support you on the way to publishing your data for reuse. The starting point for this is the information you provide in this form. Fill in what you already know – everything else can be clarified in exchange with us.

Title *

Data description *

Data fit into Text+ task area

Amount of data

Number of files

Personal data

Third-party data and rights

Type of data

Language(s)

Project status

Planned publication date

Preferred license

Internal provision/embargo

Upload data sample

Message

Your name *

Email *

I accept the Privacy Policy *

Benefits of Archiving Data with Text+

Research data are a valuable asset, forming a core component of both individual scientific work and broader academic practice.

The submission and provision of research data in Text+ follow the FAIR principles and are managed by specialized, certified Text+ Centers. These centers ensure the sustainable preservation of research data — often beyond 10 years — and facilitate their reuse. Through the Registry, Text+ guarantees that research data remain visible within the Text+ community and beyond. Text+ also provides a secure and legally compliant way to access data through the Federated Content Search (FCS). Since research data is often interconnected, Text+, as part of the NFDI, ensures interdisciplinary linkage of datasets.

Connecting Your Own Archive to Text+

We also look forward to making your research data repository available in Text+ using standard interfaces. To that end, the sustainability of processes within such an archive is typically documented through certification, such as Nestor or Core Trust Seal. By integrating with the Text+ infrastructure, these archives can also benefit from Text+ software services.

If you are considering making your repository part of the Text+ network, feel free to contact us via the Helpdesk.

Go to Helpdesk

Further Information

Data ingest typically involves the following steps:

Reviewing the data and preparing it for annotation (metadata):
- What data formats are available? Can proprietary formats be replaced with open formats?
- What is the subject of the data (which data center is best suited for it)?
- What is the status of the data? Is the project completed, or is it ongoing? Is there versioning?
- Who is the contact person representing those involved in data collection and analysis?
- Who owns the data? Where do the rights lie? Are there any restrictions on data usage (data protection, licensing rights)?
- Is there already a basic set of metadata available for ingestion, or a description of the research data (e.g., a working paper or publication)?
- What are the costs and effort required for archiving?
Data preparation and metadata enrichment:
- Creating a well-structured file hierarchy, removing duplicates and temporary files, converting formats to those suitable for long-term archiving (LTA).
- In collaboration with the receiving data center: Adding missing metadata and conversion into a machine-readable format using a data schema.
Ingestion into the archive system (technical ingest):
- Generating or supplementing technical metadata: structural metadata, copy protection checks, validation checks, checksums, etc.
Long-term preservation preparations and maintenance:
- Semantic preservation vs. bitstream preservation.
- Updating metadata (e.g., contact person, changes in rights).
- Format conversion (e.g., updating to the latest audio/video formats).

Ensuring the findability and reusability of research data requires its best possible description using metadata. In Text+, all research data should be annotated at least with the descriptive elements from DataCite. Ideally, your research data should be described using these 20 fields:

No.	Field	Description	Obligation
1.	Identifier	Unique identifier	M
2.	Creator	Author of the resource	M
3.	Title	Title of the resource	M
4.	Publisher	Publisher of the resource	M
5.	PublisherYear	Year of publication	M
6.	ResourceType	Type of resource	M
7.	Subject	Subject area	R
8.	Contributor	Contributors to the resource	R
9.	Date	Date of creation	R
10.	Language	Language of the resource	O
11.	AlternateIdentifier	Alternative identifier, if available	O
12.	RelatedIdentifier	Identifier of related resources	R
13.	Size	Size description	O
14.	Format	Formats in which the resource is available	O
15.	Version	Versioning	O
16.	Rights	Licensing and copyright information	O
17.	Description	Plain-text description of the resource	R
18.	Geolocation	Geographical location of the resource	R
19.	FundingReference	Funding reference	O
20.	RelatedItem	Related resources	O

Note that the first six fields are mandatory (M = Mandatory), some fields are recommended (R = Recommended), while others are optional (O = Optional).

Many Text+ partners annotate research data with additional characteristics that are often specific to the type of resource. Within the data domains in Text+, there is also a variety of descriptive systems, such as those used for annotating lexical resources. As a result, the consortium has established a common core of metadata descriptions for each data domain (Lexical Resources, Collections, Editions). For guidance, you can consult the corresponding documentation:

Collections: Konzept und prototypische Implementierung beispielhafter Ressourcen
Editions: Datenmodell Editionenregistry (Text+)
Lexicalische Ressources: (forthcoming)

Wohin damit? So kommen Ihre Forschungsdaten in die Text+ Infrastruktur. https://zenodo.org/doi/10.5281/zenodo.10036031
How-to “Data Depositing in Text+”: Your research data’s way into the Text+ infrastructure. https://zenodo.org/doi/10.5281/zenodo.11618653
Leitlinie für das Integrieren von Daten in Text+/NFDI https://zenodo.org/doi/10.5281/zenodo.12744055

Data Depositing

Preserving Research Data with Text+

Data Depositing

Detailed information

Benefits of Archiving Data with Text+

Connecting Your Own Archive to Text+

Further Information

General Information on Data Ingest

Information on Metadata

Publications