Surprisal and Crossword Clues difficulty: Evaluating Linguistic Processing between LLMs and Humans

IRIS

Crossword clue difficulty is traditionally judged by human setters, leaving automated puzzle generators without an objective yard-stick. We model difficulty as the Surprisal of the answer given the clue, estimating it with token probabilities from large language models. Comparing three models three causal LLMs-Llama-3-8B, Llama-2-7B, and Ita-GPT-2-121M. with 60 human solvers on 160 hand-balanced clues, Surprisal correlates negatively with accuracy (r = –0.62 for nominal clues). These results show that language-model Surprisal captures some of the cognitive load humans experience and that language-specific training and model scale both matter; the metric therefore enables adaptive crossword generation and provides a new test-bed for probing the alignment between human and model linguistic processing.

Surprisal and Crossword Clues difficulty: Evaluating Linguistic Processing between LLMs and Humans

Asya Zanollo^{Writing – Original Draft Preparation};Achille Fusco^{Writing – Original Draft Preparation};Kamyar Zeinalipour^{Data Curation};Cristiano Chesi^{Conceptualization}

2025-01-01

Abstract

Crossword clue difficulty is traditionally judged by human setters, leaving automated puzzle generators without an objective yard-stick. We model difficulty as the Surprisal of the answer given the clue, estimating it with token probabilities from large language models. Comparing three models three causal LLMs-Llama-3-8B, Llama-2-7B, and Ita-GPT-2-121M. with 60 human solvers on 160 hand-balanced clues, Surprisal correlates negatively with accuracy (r = –0.62 for nominal clues). These results show that language-model Surprisal captures some of the cognitive load humans experience and that language-specific training and model scale both matter; the metric therefore enables adaptive crossword generation and provides a new test-bed for probing the alignment between human and model linguistic processing.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Parole chiave
	
				surprisal, llm, gpt, crossword, education, linguistic games, puzzle, Crossword difficulty
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12076/22481

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

social impact