From recursion to incrementality: return to recurrent neural networks

IRIS

Large language models grounded on attention-based architectures have outperformed recurrent neural networks (RNNs), like those based on long short-term memory (LSTM) gating systems, in various natural language processing tasks. However, despite optimization, these models (i) require unreasonable training data compared to what children need, (ii) exhibit an inverse relationship between their performance and their linguistic explanatory value, and, crucially, (iii) avoid reasonable (word-by-word) incremental sentence processing, raising doubts about their cognitive plausibility. In this paper, we address these issues starting from the intuition that RNNs directly model incrementality, a key factor in human language processing. Specifically, we discuss the performance of an RNN architecture, eMG-RNN, both during training and in minimal pairs forced-choice tasks in English and Italian. We observe that: (i) ecological training regimens lead to a decrease in cross-entropy loss, although performance on linguistic minimal pairs does not improve; (ii) the specific gating system adopted induces relevant structural biases; and (iii) while these networks outperform standard LSTM, gated recurrent unit (GRU) networks, and transformer models under the same ecological training regimens, their performance on linguistic tasks remains low compared to adults (in English) and 7-year-old children (in Italian).

From recursion to incrementality: return to recurrent neural networks

Cristiano Chesi^{Writing – Original Draft Preparation};Matilde Barbini;Veronica Bressan;Achille Fusco;Sofia Neri;Maria Letizia Piccini Bianchessi;Sarah Rossi;Tommaso Sgrizzi

2026-01-01

Abstract

Large language models grounded on attention-based architectures have outperformed recurrent neural networks (RNNs), like those based on long short-term memory (LSTM) gating systems, in various natural language processing tasks. However, despite optimization, these models (i) require unreasonable training data compared to what children need, (ii) exhibit an inverse relationship between their performance and their linguistic explanatory value, and, crucially, (iii) avoid reasonable (word-by-word) incremental sentence processing, raising doubts about their cognitive plausibility. In this paper, we address these issues starting from the intuition that RNNs directly model incrementality, a key factor in human language processing. Specifically, we discuss the performance of an RNN architecture, eMG-RNN, both during training and in minimal pairs forced-choice tasks in English and Italian. We observe that: (i) ecological training regimens lead to a decrease in cross-entropy loss, although performance on linguistic minimal pairs does not improve; (ii) the specific gating system adopted induces relevant structural biases; and (iii) while these networks outperform standard LSTM, gated recurrent unit (GRU) networks, and transformer models under the same ecological training regimens, their performance on linguistic tasks remains low compared to adults (in English) and 7-year-old children (in Italian).

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2026

Appare nelle tipologie:

1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12076/24697

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

social impact