Ludwig-Maximilians-Universität (LMU)

Text+ center: Bayerisches Archiv für Sprachsignale

Type of center: data center/competence center

LMU is one of the strongest research universities in Europe. With its differentiated range of subjects, it has outstanding potential for pioneering research. The Bavarian Archive for Speech Signals (BAS) of the Department of Phonetics at the LMU collects, standardizes, maintains and distributes digital language resources with a focus on spoken German. These can be technically motivated, but also scientific language data collections, which are primarily made available in the form of public web services.

The resources provided by BAS for Text+ include a repository for language databases which contains more than 40 collections of language data in several languages (German, English, Japanese, Italian, etc.), a range of web-based language processing services and various stand-alone tools for data collection and analysis.

For researchers and projects that want to store their data long-term, BAS offers support with data preparation and data import. This support includes advice on the collection, processing and management of data during a research project. The archive also offers online and on-site workshops on the available language technology web services and tools.

Highlights of provided data and services

WebMAUS creates a phonetic segmentation and labeling for multiple languages based on the speech signal and a phonological transcript
TextAlign: mapping of an orthographic text string to the corresponding phonological transcript
SpeechRecorder: platform-independent audio recording software tailored to the requirements of speech recordings
EMU Speech Database Management System: collection of software tools for creating, editing and analyzing speech databases
BAS Corpora: large collection of spoken language recordings and transcriptions

Third-party data reception

The BAS Repository offers individual researchers and projects the opportunity to store research data permanently (for at least 10 years). The prerequisite is that the necessary declarations of consent are available, that the media and annotation data are available in the supported formats, and that they are described by metadata in CMDI format.

The preferred data are phonetic tools and services, language statistics, pronunciation lexicons as well as German language recordings and transcriptions and multimodal data in the formats supported by BAS. Please send inquiries to BAS via the Text+ Helpdesk or directly by e-mail to bas@phonetik.uni-muenchen.de

Contact

Contact for Text+: bas@bas.uni-muenchen.de