Preserving Research Data with Text+
As part of the NFDI, Text+ serves as a hub for text- and language-based research data, offering the academic community the opportunity to deposit their data. These datasets are not only archived for the long term but are also made available for reuse in accordance with legal regulations. Access is open to all researchers with an academic affiliation, ensuring compliance with data protection and licensing requirements.
One of the Text+ Centers takes responsibility for archiving the data in a sustainable and interoperable manner using Text+ services and tools. The Text+ Centers provide comprehensive support for all aspects of research data submission. This includes data preparation, annotation, and transfer – along with the description of research data using Text+ specific metadata (see further information) – as well as selecting a suitable data center. You are welcome to use this form to provide an initial description of your data. Our helpdesk team will then work with you to finalize all necessary steps for publication.
Data Depositing
Using the information provided in this simple form, our helpdesk team can initiate a dialogue with you, ultimately leading to published and reusable data. Fill in what you already know – everything else can be clarified in exchange with us.
Benefits of Archiving Data with Text+
Research data are a valuable asset, forming a core component of both individual scientific work and broader academic practice.
The submission and provision of research data in Text+ follow the FAIR principles and are managed by specialized, certified Text+ Centers. These centers ensure the sustainable preservation of research data — often beyond 10 years — and facilitate their reuse. Through the Registry, Text+ guarantees that research data remain visible within the Text+ community and beyond. Text+ also provides a secure and legally compliant way to access data through the Federated Content Search (FCS). Since research data is often interconnected, Text+, as part of the NFDI, ensures interdisciplinary linkage of datasets.
Connecting Your Own Archive to Text+
We also look forward to making your research data repository available in Text+ using standard interfaces. To that end, the sustainability of processes within such an archive is typically documented through certification, such as Nestor or Core Trust Seal. By integrating with the Text+ infrastructure, these archives can also benefit from Text+ software services.
If you are considering making your repository part of the Text+ network, feel free to contact us via the Helpdesk.
Further Information
Data ingest typically involves the following steps:
Reviewing the data and preparing it for annotation (metadata):
- What data formats are available? Can proprietary formats be replaced with open formats?
- What is the subject of the data (which data center is best suited for it)?
- What is the status of the data? Is the project completed, or is it ongoing? Is there versioning?
- Who is the contact person representing those involved in data collection and analysis?
- Who owns the data? Where do the rights lie? Are there any restrictions on data usage (data protection, licensing rights)?
- Is there already a basic set of metadata available for ingestion, or a description of the research data (e.g., a working paper or publication)?
- What are the costs and effort required for archiving?
Data preparation and metadata enrichment:
- Creating a well-structured file hierarchy, removing duplicates and temporary files, converting formats to those suitable for long-term archiving (LTA).
- In collaboration with the receiving data center: Adding missing metadata and conversion into a machine-readable format using a data schema.
Ingestion into the archive system (technical ingest):
- Generating or supplementing technical metadata: structural metadata, copy protection checks, validation checks, checksums, etc.
Long-term preservation preparations and maintenance:
- Semantic preservation vs. bitstream preservation.
- Updating metadata (e.g., contact person, changes in rights).
- Format conversion (e.g., updating to the latest audio/video formats).
Ensuring the findability and reusability of research data requires its best possible description using metadata. In Text+, all research data should be annotated at least with the descriptive elements from DataCite. Ideally, your research data should be described using these 20 fields:
No. | Field | Description | Obligation |
---|---|---|---|
1. | Identifier | Unique identifier | M |
2. | Creator | Author of the resource | M |
3. | Title | Title of the resource | M |
4. | Publisher | Publisher of the resource | M |
5. | PublisherYear | Year of publication | M |
6. | ResourceType | Type of resource | M |
7. | Subject | Subject area | R |
8. | Contributor | Contributors to the resource | R |
9. | Date | Date of creation | R |
10. | Language | Language of the resource | O |
11. | AlternateIdentifier | Alternative identifier, if available | O |
12. | RelatedIdentifier | Identifier of related resources | R |
13. | Size | Size description | O |
14. | Format | Formats in which the resource is available | O |
15. | Version | Versioning | O |
16. | Rights | Licensing and copyright information | O |
17. | Description | Plain-text description of the resource | R |
18. | Geolocation | Geographical location of the resource | R |
19. | FundingReference | Funding reference | O |
20. | RelatedItem | Related resources | O |
Note that the first six fields are mandatory (M = Mandatory), some fields are recommended (R = Recommended), while others are optional (O = Optional).
Many Text+ partners annotate research data with additional characteristics that are often specific to the type of resource. Within the data domains in Text+, there is also a variety of descriptive systems, such as those used for annotating lexical resources. As a result, the consortium has established a common core of metadata descriptions for each data domain (Lexical Resources, Collections, Editions). For guidance, you can consult the corresponding documentation:
- Collections: Konzept und prototypische Implementierung beispielhafter Ressourcen
- Editions: Datenmodell Editionenregistry (Text+)
- Lexicalische Ressources: (forthcoming)
- Wohin damit? So kommen Ihre Forschungsdaten in die Text+ Infrastruktur. https://zenodo.org/doi/10.5281/zenodo.10036031
- How-to “Data Depositing in Text+”: Your research data’s way into the Text+ infrastructure. https://zenodo.org/doi/10.5281/zenodo.11618653
- Leitlinie für das Integrieren von Daten in Text+/NFDI https://zenodo.org/doi/10.5281/zenodo.12744055