Large language models grounded on attention-based architectures have outperformed recurrent neural networks (RNNs), like those based on long short-term memory (LSTM) gating systems, in various natural language processing tasks. However, despite optimization, these models (i) require unreasonable training data compared to what children need, (ii) exhibit an inverse relationship between their performance and their linguistic explanatory value, and, crucially, (iii) avoid reasonable (word-by-word) incremental sentence processing, raising doubts about their cognitive plausibility. In this paper, we address these issues starting from the intuition that RNNs directly model incrementality, a key factor in human language processing. Specifically, we discuss the performance of an RNN architecture, eMG-RNN, both during training and in minimal pairs forced-choice tasks in English and Italian. We observe that: (i) ecological training regimens lead to a decrease in cross-entropy loss, although performance on linguistic minimal pairs does not improve; (ii) the specific gating system adopted induces relevant structural biases; and (iii) while these networks outperform standard LSTM, gated recurrent unit (GRU) networks, and transformer models under the same ecological training regimens, their performance on linguistic tasks remains low compared to adults (in English) and 7-year-old children (in Italian).
From recursion to incrementality: return to recurrent neural networks
Cristiano Chesi
Writing – Original Draft Preparation
;Matilde Barbini;Veronica Bressan;Achille Fusco;Sofia Neri;Maria Letizia Piccini Bianchessi;Sarah Rossi;Tommaso Sgrizzi
2026-01-01
Abstract
Large language models grounded on attention-based architectures have outperformed recurrent neural networks (RNNs), like those based on long short-term memory (LSTM) gating systems, in various natural language processing tasks. However, despite optimization, these models (i) require unreasonable training data compared to what children need, (ii) exhibit an inverse relationship between their performance and their linguistic explanatory value, and, crucially, (iii) avoid reasonable (word-by-word) incremental sentence processing, raising doubts about their cognitive plausibility. In this paper, we address these issues starting from the intuition that RNNs directly model incrementality, a key factor in human language processing. Specifically, we discuss the performance of an RNN architecture, eMG-RNN, both during training and in minimal pairs forced-choice tasks in English and Italian. We observe that: (i) ecological training regimens lead to a decrease in cross-entropy loss, although performance on linguistic minimal pairs does not improve; (ii) the specific gating system adopted induces relevant structural biases; and (iii) while these networks outperform standard LSTM, gated recurrent unit (GRU) networks, and transformer models under the same ecological training regimens, their performance on linguistic tasks remains low compared to adults (in English) and 7-year-old children (in Italian).I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


