Universität Hamburg (UniHH)

Text+ center: University of Hamburg: Hamburg Centre for Speech Corpora

Type of center: data center/competence center

Based on the experience of the Hamburg Centre for Language Corpora (HZSK), an association of members of various faculties and institutes of the University of Hamburg, the University of Hamburg (UHH) supports the consistency and coordination of computer-aided empirical research and teaching in linguistics and related disciplines affiliated with the UHH. Together with the HZSK community, it pursues the goal of ensuring the sustainable usability of linguistic primary research data beyond temporary research projects.

The UHH repository, which is operated by the Centre for Sustainable Research Data Management (ZFDM), houses more than 50 corpora in the HZSK community. The majority of corpora belong to the thematic area of multilingual oral and written communication as well as data from less common or endangered languages. In addition to a large number of corpora that serve to document children’s language acquisition, there are corpora that cover a range of aspects of multilingualism in everyday life, including interpreting in hospitals (DiK corpus) and are available for subsequent use.

Highlights of provided data and services

Corpus Services: The software offers various functions for reviewing and preparing speech corpora
Kiezdeutschkorpus (KiDKo): A multi-modal digital corpus of spontaneous discourse data from informal peer group.
Dolmetschen im Krankenhaus (DiK): Transcriptions of various kinds of doctor-patient communication in hospitals (monolingual conversations in German, Portuguese and Turkish and interpreted conversations).
Reference Corpus Middle Low German/Low Rhenish (1200–1650) (ReN): The ReN offers manuscripts, prints, and inscriptions. It is part of the “Corpus of Historical German Texts”.
The Hamburg MapTask Corpus (HAMATAC): A spoken language corpus documenting the performance of 24 L2 learners of German in a map task.
Phonologie-Erwerb Deutsch-Spanisch als Erste Sprachen (PEDSES): A phonetically and orthographically transcribed corpus of German/Spanish simultaneous bilingual children (longitudinal: age one to three and a half/four years).

Third-party data reception

New data (corpora of spoken language) can be added to the HZSK community within the FDR upon request to Prof. Dr. Kristin Bührig. A prerequisite for the integration of the data into the FDR is their already complete curation, for which we provide advisory support.

Contact

Contact for Text+: corpora@uni-hamburg.de