Text+ Newsletter #2

Welcome!

We are pleased to present the second Text+ Newsletter. We warmly invite you to take a look at our project activities. We appreciate your feedback, questions, or suggestions, which can easily reach us via the Office.

Text+ Plenary on October 10 and 11, 2024

The 3rd Text+ Plenary will take place on October 10 and 11, 2024, at Schloss Mannheim. The consortium, the community, representatives of other NFDI consortia, and interested parties are all invited. This year’s Plenary is themed “Large Language Models (LLMs) and their Use.” Participants will discuss in lectures, panel discussions, and working group sessions how they use these technologies in their own work and how they can contribute to further development with their data and offerings.

For those who have not yet familiarized themselves with the background of LLMs, there will be an opportunity to participate in a pre-conference tutorial on Large Language Models (LLMs) on Wednesday, October 9, from 4 to 6 PM. The tutorial will present the basics and background of language models and provide an introduction to the topic. It will take place in the Great Lecture Hall of the Leibniz Institute for the German Language.

Participation in all events of the 3rd Text+ Plenary 2024 is free of charge. Further information and registration options can be found on the 3rd Text+ Plenary website.

Highlights from the Blog

Mapping Online Book Reception Across Cultures and Languages (January 25-27, 2024), ZiF Workshop (Bielefeld)

With cultural discourses increasingly shifting online, data on book reception is now available in large quantities and from authors worldwide. This data offers a completely new perspective on questions of reading experience, literary quality, and even world literature as such. Therefore, it is essential to develop new theoretical models and use and evaluate established procedures for balanced approaches to dataset creation, data protection, and analysis methods to account for online book reception as a global and multidimensional phenomenon.

To address this challenge, 23 researchers from Europe, North America, and Asia gathered for the workshop “Mapping Online Book Reception Across Cultures and Languages” at the Institute for Interdisciplinary Research at the University of Bielefeld. The workshop focused on the idea of examining online book reviews as a cultural practice by non-professional readers in the digital realm using a diverse and interdisciplinary set of methods.

The English-language blog post by Tina Ternes (University of Basel) and Xenia Bojarski (University of Zurich) can be found on the Text+ Blog.

Cite this blog post: Tina Ternes/Xenia Bojarski (2024, May 13). Mapping Online Book Reception Across Cultures and Languages (January 25-27, 2024), ZiF Workshop (Bielefeld). Text+ Blog. Retrieved on June 17, 2024, from https://doi.org/10.58079/11nxm

Fluffy Workflow: New Tools for Data Import into TextGridRep

Inspired by John Cotton Dana’s quote “Preservation is Use,” the long-term archiving of research data means that reuse must be a guiding principle for the technical infrastructure from the outset. How can a research data infrastructure meet this principle? What should the workflows for data ingestion look like? How should research data be prepared and presented to encourage reuse? These questions are addressed by the team behind the new import workflow for the TextGrid Repository (TextGridRep).

The article reports on the second Text+ Code Sprint for Humanities Data, which took place on April 11 and 12 at the SUB Göttingen. Organized by Text+ partners SUB Göttingen, GWDG, and TU Dresden, the code sprint aimed to introduce researchers to new tools for importing and publishing their data in TextGridRep.

TextGridRep, an important repository for humanities research data, offers sustainable, permanent, and secure publication of research data, supported by detailed metadata. Various tools were presented during the code sprint, including:

tg-model: A tool for generating TextGrid metadata files from TEI data
tgadmin and tgclients: Python tools for data import into TextGridRep
TextGrid Import UI: A Jupyter Notebook-based web application for convenient workflow operation

The tools were made available on the Text+ JupyterHub so that they can be used directly in the browser.

The new import workflow overcomes previous obstacles and enables a convenient data import without the TextGridLab. The participants of the code sprint gained valuable experience and provided feedback for the further development of the tools. Another code sprint is planned for later this year.

Cite this blog post: Florian Barth, Stefan Buddenbohm, José Calvo Tello, George Dogaru (GWDG), Stefan E. Funk, Mathias Göbel, Ralf Klammer (TU Dresden), Ubbo Veentjer (all authors without affiliation information: SUB Göttingen) (2024, May 8). Fluffy Workflow: New Tools for Data Import into TextGridRep. Text+ Blog. Retrieved on June 17, 2024, from https://doi.org/10.58079/11nmp

Workshop Reports

Text+ Forum on Digitalization – Discussion Forum on the Further Development of the DFG Practical Rules “Digitalization”

FAIR research data doesn’t fall from the sky. Especially in the context of retro-digitizing works, the criteria to ensure FAIRness in the provision of research data are very demanding. Since it is anything but easy to keep all the requirements for each project in mind, one of the main tasks of NFDI and Text+ is to provide advice on research data. A key guideline for this has been the so-called practical rules “Digitalization” of the German Research Foundation (DFG) for many years. However, due to the dynamic developments of the digital transformation, the document, last updated in 2022, always has an unfinished character. This blog post series aims to address material-specific desiderata of the guidelines and initiate a process in which appropriate recommendations are developed.

The first part dealt with the contextualization of the topic “Guidelines for Digitalization.” Read more.

The second post addresses audiovisual (AV) media and specific format recommendations. Read more.

The DFG has deliberately invited experts from the respective communities to participate in further shaping the practical rules “Digitalization.” Text+ aims to contribute and invites you to participate. We look forward to your comments, suggestions for further material-specific recommendations, etc. - preferably directly via the comment function of the Text+ blog or through our Helpdesk.

RDMO Offering

To support the creation of data management plans, Text+ offers its own questionnaire, which is based on the standard catalog of the RDMO community and will be gradually revised and expanded in the future. The questionnaire supports researchers in their self-organization regarding research data management and raises awareness for a sustainable and FAIR handling of research data. The catalog is also integrated into the RDMO instance GRO.plan of the eResearch Alliance of SUB Göttingen and linked with the consulting services of Text+. Researchers can answer the questions independently or in the context of accompanying advice through the Text+ Helpdesk and address any questions that may arise together with a helpdesk agent.

1st Joint Meeting of the Text+ Coordination Committees on June 3, 2024

The first joint meeting of the Text+ Coordination Committees took place on June 3, 2024. Against the backdrop of submitting the interim report and the start of the follow-up application phase, this meeting provided an important opportunity to critically review the current status of the project. The overarching theme of the meeting was the research data management services provided by Text+. These were presented by Philipp Wieder (Operations Speaker/Lead TA IO/OCC) and Andreas Witt (Speaker of the NFDI Consortium Text+ and from October 2024, Scientific Speaker Text+).

In the subsequent joint discussion, the chairs of the individual Coordination Committees shared their expectations and feedback on the current status of the Text+ portal. They also reported on the latest developments in their communities and the resulting requirements for the further development of Text+. Based on the feedback received and the valuable discussion, the need for action for the project will be carefully analyzed. Targeted solutions will then be developed to effectively address the identified challenges and advance the development of the project.

The proposal to hold this format twice a year in the future – once virtually and once in person as part of the Plenary – was widely approved. We thank all participants of the first joint meeting of the Text+ Coordination Committees for their active participation and valuable feedback. Special thanks also go to the members who provided their feedback in writing prior to the meeting.

Publications, Services & Information Offers

Text+ maintains its bibliography on Zotero and provides a structured view on its portal.

Documentation of the Editions Registry

With “Towards a Registry for Digital Resources – The Text+ Registry for Editions” in the journal “Datenbank-Spektrum. Zeitschrift für Datenbanktechnologien und Information Retrieval,” Text+ publishes the official documentation of the currently developing registry for editions.

Gradl, T., Kudella, C., Lordick, H., Schulz, D. Towards a Registry for Digital Resources – The Text+ Registry for Editions. Datenbank Spektrum (2024). https://doi.org/10.1007/s13222-024-00479-0.

The editions registry is part of the Text+ Registry, which serves as a central system for describing and cataloging various types of resources. For the area of editions, the Task Area Editions in Text+ pursues an open approach to include as many digital, hybrid, and printed editions as possible. The publication outlines desiderata and challenges regarding their discoverability using examples of editions and edition projects and describes the data model that accommodates the wide range of different edition types. With its comprehensive and flexible approach, the Text+ Registry is a versatile and adaptable technical component designed for future extensions and broad connectivity.

Searching and Finding – Introduction to Research with the Integrated Authority File

The publication “Searching and Finding – Introduction to Research with the Integrated Authority File” by M. Strickert and B. Fischer has been released.

Strickert, M.; Fischer, B. (2024). Searching and Finding: Introduction to Research with the Integrated Authority File. Leipzig/Frankfurt, M: Deutsche Nationalbibliothek (https://d-nb.info/1325174785).

It explains how authority data improve the search for literature and data. The accessible introduction describes what authority data are and their role in describing publications, research data, and collection items. The focus is on the authority data of the Integrated Authority File (GND), the central controlled vocabulary for culture and research in the German-speaking world. Various examples show how to find suitable search terms in the GND and how authority data enhance the discoverability of one’s texts and data. The publication is available online.

Events & Recaps

BiblioCon

Four days, up to 26 parallel events, and nearly 5,000 participants: that was the 112th BiblioCon, held from June 4 to 7, 2024, at the Congress Center Hamburg. The diverse program also offered interesting insights for research, university, and special libraries. A dedicated theme circle focused on “Research-Related Services and Open Science” – a forum also utilized by many involved in Text+ to inspire more libraries about the topics and issues of language and text-based research data. This article summarizes the contributions from Text+ to the BiblioCon program.

Authority Data as a Common Task of FID and NFDI (Susanne Al-Eryani, SUB Göttingen & Volker Adam, University and State Library Saxony-Anhalt, Halle)
What Metadata Does Literary Studies Need? (José Calvo-Tello, SUB Göttingen)
Frameworks and Digital Editions: Open Source Tools for Basic Editions (Kevin Wunsch & Kevin Kuck, both ULB Darmstadt)
The Integrated Authority File (GND) (Barbara Fischer, DNB)
Connecting Collections and Making Them Usable for Research – a Task (Also) for Libraries (Philippe Genet & Peter Leinen, both DNB)

Edit-a-thon for Describing Resources in the SSH Open Marketplace

On June 24, an edit-a-thon for describing resources in the SSH Open Marketplace was held at the invitation of the Association for Humanities and Cultural Studies Research Infrastructures e.V. and Text+. Many thanks to the Max Weber Foundation for hosting us at their premises in Bonn-Bad Godesberg, providing the perfect setting for a productive meeting.

The event aimed to create or curate descriptions of resources for the humanities and cultural studies community, such as services, tutorials, software, or entire workflows, in the SSH Open Marketplace. The edit-a-thon participants were recruited from Text+, the GKFI, NFDI4Culture, and NFDI4Memory.

With the Association for Humanities and Cultural Studies Research Infrastructures and Text+, there are two use cases utilizing the SSH Open Marketplace to showcase their offerings. Instead of proprietary developments, GKFI and Text+ rely on the Marketplace as an existing solution that:

is available and will remain so at least in the medium term,
is intuitively operable manually,
can be well integrated into other infrastructures via harvesting,
and is equipped with an active editorial board acting as a bridge to the community.

The edit-a-thon was initiated by Michael Kurzmeier (Austrian Centre for Digital Humanities and Cultural Heritage) with an introduction to the SSH Open Marketplace. Michael, a member of the Editorial Board of the SSH Open Marketplace, provided insights into the structure of the Marketplace and illustrated with various application examples what is possible and how the service can be easily used by researchers and projects.

Nanette Rißler-Pipka and Stefan Buddenbohm demonstrated how the SSH Open Marketplace can be easily used as a tool registry or offer catalog in other contexts using the examples of GKFI and Text+. Other contexts can include any humanities and cultural studies initiatives, with applicability particularly conceivable in the FID or NFDI context.

Subsequently, resources in the Marketplace were curated individually and in groups in a relaxed atmosphere. This resulted in not only the addition of new resources but also the enhancement/correction of many existing resources. An important outcome of the meeting was also the diverse feedback on both the SSH Open Marketplace and the presentation of offerings by Text+.

Understanding Speech: AI and Spoken Language

On June 27 and 28, 2024, a Text+ event on “Understanding Speech: AI and Spoken Language” was held at the Lyrik Kabinett Foundation in Munich. The event explored how the rapid development of artificial intelligence is revolutionizing the generation, analysis, and transcription of spoken language. The topic was introduced with exciting keynote lectures, accompanied by many practical short contributions presenting current projects, tools, and methods for processing spoken language. Johann Prenninger (BMW Group) opened the first day with a presentation on current challenges in implementing machine learning and AI at BMW with examples from the automotive industry. In the afternoon, transcription tools, a detailed evaluation of well-known speech recognition models, and assessments of the efficiency of automated transcription were presented, with a focus on the problems of speech recognition in children. The second day was initiated by Barbara Plank (Munich AI and Natural Language Processing) with a keynote on dialect processing, which is often underrepresented or absent in the training data of language models. Insights into the creation and investigation of language corpora from the Hispano-American and English-speaking West African regions followed. Other contributions addressed the automatic transcription of podcast recordings and the optimization of workflows involving automatic transcription. The recognition of dialect features and a live demonstration of an optopalatograph rounded out the lecture segment before Christoph Draxler concluded with a short workshop on the web services, corpora, and current developments of the Bavarian Archive for Speech Signals (BAS).

More information can be found on the event’s website.

Dates

All events - both upcoming and past - can also be found in our event calendar on the Text+ portal.