This work explores alternative gating systems in simple Recurrent Neural Networks (RNNs) with the intent to induce linguistically motivated biases during training, ultimately affecting models’ performance on the BLiMP task. Here we focus on the BabyLM 10M training corpus only (Strict-Small Track). Our experiments reveal that: (i) standard RNN variants—LSTMs and GRUs—are insufficient for properly learning the relevant set of linguistic constraints; (ii) quality and size of the training corpus have little impact on these networks since we observed comparable performance of LSTMs trained exclusively on the child-directed speech portion of the corpus; (iii) increasing the size of the embedding and hidden layers does not significantly improve performance. In contrast, specifically gated RNNs (eMG-RNNs), inspired by certain Minimalist Grammar intuitions, exhibit advantages in both training loss and BLiMP accuracy although their performance is not yet comparable to that of humans.
Different Ways to Forget: Linguistic Gates in Recurrent Neural Networks
Cristiano Chesi
Writing – Original Draft Preparation
;Veronica BressanWriting – Original Draft Preparation
;Matilde BarbiniWriting – Original Draft Preparation
;Achille FuscoWriting – Original Draft Preparation
;Maria Letizia Piccini BianchessiWriting – Original Draft Preparation
;Sofia NeriWriting – Original Draft Preparation
;Sarah RossiWriting – Original Draft Preparation
;Tommaso SgrizziWriting – Original Draft Preparation
2024-01-01
Abstract
This work explores alternative gating systems in simple Recurrent Neural Networks (RNNs) with the intent to induce linguistically motivated biases during training, ultimately affecting models’ performance on the BLiMP task. Here we focus on the BabyLM 10M training corpus only (Strict-Small Track). Our experiments reveal that: (i) standard RNN variants—LSTMs and GRUs—are insufficient for properly learning the relevant set of linguistic constraints; (ii) quality and size of the training corpus have little impact on these networks since we observed comparable performance of LSTMs trained exclusively on the child-directed speech portion of the corpus; (iii) increasing the size of the embedding and hidden layers does not significantly improve performance. In contrast, specifically gated RNNs (eMG-RNNs), inspired by certain Minimalist Grammar intuitions, exhibit advantages in both training loss and BLiMP accuracy although their performance is not yet comparable to that of humans.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.