We present BLiMP-IT, a linguistically-informed benchmark to assess the performance of Italian Language Models (LMs). Inspired by state-of-the-art tools for LM evaluation and informed both by generative theorizing and psycholinguistic metrics, this benchmark tests a rich variety of structures using minimal pair contrasts, i.e., a grammatical sentence and an ungrammatical one minimally differing with respect to a single morphosyntactic property. Prompting the model to assign a probability value to the sentences within each pair, BLiMP-IT tests LMs accuracy, as well as their ability to reach linguistically meaningful generalizations, ultimately offering insights on human-machine comparability and the validity of the Poverty of Stimulus hypothesis.

Language models assessment through linguistically motivated contrasts

Neri, Sofia
Validation
;
Rossi, Sarah
Data Curation
;
Chesi, Cristiano
Writing – Review & Editing
2026-01-01

Abstract

We present BLiMP-IT, a linguistically-informed benchmark to assess the performance of Italian Language Models (LMs). Inspired by state-of-the-art tools for LM evaluation and informed both by generative theorizing and psycholinguistic metrics, this benchmark tests a rich variety of structures using minimal pair contrasts, i.e., a grammatical sentence and an ungrammatical one minimally differing with respect to a single morphosyntactic property. Prompting the model to assign a probability value to the sentences within each pair, BLiMP-IT tests LMs accuracy, as well as their ability to reach linguistically meaningful generalizations, ultimately offering insights on human-machine comparability and the validity of the Poverty of Stimulus hypothesis.
2026
LM evaluation, minimal pairs, morphosyntax, Poverty of Stimulus
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12076/25177
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact