Structural sensitivity does not entail grammaticality: assessing LLMs against the Universal Functional Hierarchy

IRIS

This paper investigates whether large language models (LLMs) generalize core syntactic properties associated with restructuring verbs in Italian, a domain tied to the universal hierarchy of functional heads proposed by Cinque [1, 2]. Specifically, we examine whether LLMs distinguish between restructuring and control verbs based on canonical syntactic diagnostics: verb ordering, clitic climbing, and auxiliary selection. We also probe how models interpret novel infinitive-selecting pseudoverbs, testing whether they default to restructuring- or control-like behavior. Using controlled minimal pairs, we evaluate five models of different sizes: Minerva-7B-base-v1.0 [3], GPT2-medium-italian-embeddings [4], Bert-base-italian-xxl-uncased [5], GPT2-small-italian [4], , and GePpeTto [6]. Our findings reveal that none of the models internalize the functional hierarchy, nor do they systematically block clitic climbing for control verbs, or are sensitive to auxiliary selection variability of the restructuring and control classes. These results highlight fundamental limitations in the syntactic generalization abilities of current LLMs, particularly in domains where structural contrasts are not overtly marked in the input.

Structural sensitivity does not entail grammaticality: assessing LLMs against the Universal Functional Hierarchy

Tommaso Sgrizzi^{Writing – Original Draft Preparation};Asya Zanollo^{Writing – Original Draft Preparation};Cristiano Chesi^Supervision

2025-01-01

Abstract

This paper investigates whether large language models (LLMs) generalize core syntactic properties associated with restructuring verbs in Italian, a domain tied to the universal hierarchy of functional heads proposed by Cinque [1, 2]. Specifically, we examine whether LLMs distinguish between restructuring and control verbs based on canonical syntactic diagnostics: verb ordering, clitic climbing, and auxiliary selection. We also probe how models interpret novel infinitive-selecting pseudoverbs, testing whether they default to restructuring- or control-like behavior. Using controlled minimal pairs, we evaluate five models of different sizes: Minerva-7B-base-v1.0 [3], GPT2-medium-italian-embeddings [4], Bert-base-italian-xxl-uncased [5], GPT2-small-italian [4], , and GePpeTto [6]. Our findings reveal that none of the models internalize the functional hierarchy, nor do they systematically block clitic climbing for control verbs, or are sensitive to auxiliary selection variability of the restructuring and control classes. These results highlight fundamental limitations in the syntactic generalization abilities of current LLMs, particularly in domains where structural contrasts are not overtly marked in the input.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Parole chiave
	
				Large language models (LLMs), Cognitive plausibility, Syntactic evaluation, Universal hierarchy of functional heads, Restructuring verbs
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12076/22482

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

social impact