Virtuelles DH-Kolloquium an der BBAW
Sabine Schulte im Walde (Universität Stuttgart) spricht über „Collecting and Investigating Features of Human Semantic Ratings and Resources“.
Collecting and Investigating Features of Human Semantic Ratings and Resources
Developing computational models to predict meaning components of words and multiword expressions typically goes hand in hand with creating or using reliable lexical resources as gold standards for formative intrinsic evaluation. Not much work however pays attention to whether and how much both the gold standards and the prediction models vary according to the properties of the targets within the lexical resources, and potential skewness hinders us from a generalised assessment of the models.
In this talk, I suggest a novel route to assess the interactions between properties of targets in lexical-semantic datasets and human ratings on target-semantic variables, based on a study on English and German noun compounds such as „climate change“ and „crocodile tears“ in English, and „Ahornblatt“ (maple leaf) and „Fliegenpilz“ (toadstool) in German. I will first introduce (1) a novel collection of compositionality ratings for 1,099 German noun compounds, where we asked the human judges to provide compound and constituent properties (such as paraphrases, meaning contributions, hypernymy relations, and concreteness) before judging the compositionality; and (2) a series of analyses on rating distributions and interactions with compound and constituent properties for our novel collection as well as previous gold standard resources in English (Reddy et al., 2011; Cordeiro et al., 2019) and German (Schulte im Walde et al., 2013; 2016). Following the analyses I will then discuss on a meta level to what extent one should aim for an even distribution of human ratings across the pre-specified scale, and to what extent one should take into account properties of targets when creating a novel resource and when using a resource for evaluation.
References:
Cordeiro, Silvio, Aline Villavicencio, Marco Idiart & Carlos Ramisch. 2019. Unsupervised Compositionality Prediction of Nominal Compounds. Computational Linguistics 45(1). 1–57.
Reddy, Siva, Diana McCarthy & Suresh Manandhar. 2011. An Empirical Study on Compositionality in Compound Nouns. In Proceedings of the 5th international joint conference on natural language processing, 210–218. Chiang Mai, Thailand.
Schulte im Walde, Sabine, Stefan Müller & Stephen Roller. 2013. Exploring Vector Space Models to Predict the Compositionality of German Noun-Noun Compounds. In Proceedings of the 2nd joint conference on lexical and computational semantics, 255–265. Atlanta, GA, USA.
Schulte im Walde, Sabine, Anna Hätty, Stefan Bott & Nana Khvtisavrishvili. 2016. Gℎost-NN: A Representative Gold Standard of German Noun-Noun Compounds. In Proceedings of the 10th international conference on language resources and evaluation, 2285–2292. Portoroz, Slovenia.
***
Die Veranstaltung findet virtuell statt; eine Anmeldung ist nicht notwendig. Zum Termin ist der virtuelle Konferenzrraum über den Link https://meet.gwdg.de/b/lou-eyn-nm6-t6b erreichbar. Wir möchten Sie bitten, bei Eintritt in den Raum Mikrofon und Kamera zu deaktivieren. Nach Beginn der Diskussion können Wortmeldungen durch das Aktivieren der Kamera signalisiert werden.
Der Fokus der Veranstaltung liegt sowohl auf praxisnahen Themen und konkreten Anwendungsbeispielen als auch auf der kritischen Reflexion digitaler geisteswissenschaftlicher Forschung. Weitere Informationen finden Sie auf der Website der BBAW.
Weitere Informationen: https://dhd-blog.org/?p=19075
zuletzt aktualisiert: 28.09.2023