In this paper, we present a refinement of the deterministic lexicon extraction algorithm discussed in Chesi (2023). This new version is able to infer categorial expectations from any University Dependency (UD) treebank (Nivre et al 2017) and recast non-trivial backward dependencies in terms of movement operations (Chomsky 1995, Stabler 2012). The improvements with respect to a preliminary version of the algorithm are related to a more granular definition of the functional categories and a significant reduction in lexical ambiguity. These results might constitute a substantial baseline for (very) large language models assessment in terms of descriptive adequacy.
Extracting an Expectation-based Lexicon for UD Treebanks
matteo gayWriting – Original Draft Preparation
;cristiano chesi
Writing – Review & Editing
2023-01-01
Abstract
In this paper, we present a refinement of the deterministic lexicon extraction algorithm discussed in Chesi (2023). This new version is able to infer categorial expectations from any University Dependency (UD) treebank (Nivre et al 2017) and recast non-trivial backward dependencies in terms of movement operations (Chomsky 1995, Stabler 2012). The improvements with respect to a preliminary version of the algorithm are related to a more granular definition of the functional categories and a significant reduction in lexical ambiguity. These results might constitute a substantial baseline for (very) large language models assessment in terms of descriptive adequacy.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.