Warming up recurrent neural networks to maximise reachable multistability greatly improves learning
Recurrent neural networks (RNNs) are a special type of artificial neural networks that can be used to process sequences, such as time series or sentences, through an internal state that serves as a memory. However, training RNNs is known to be difficult, especially for long sequences. Indeed, when gradients are backpropagated through a high number of timesteps, they are more prone to either vanish or explode, making it difficult to learn long-term dependencies. Previous work (Vecoven et al., 2021) introduced RNNs with multistable dynamics and showed that it can improve the learning of such dependencies. In this new paper (Lambrechts et al., 2023), we expand this idea by first deriving a measure of multistability, called the VAA. This metric is then used to unveil the correlation between the reachable multistability of an RNN and its learning of long-term dependencies, both in a supervised and a reinforcement learning setting. Secondly, we establish a derivable approximation of our new measure. Gradient ascent steps can then be performed on a usual RNN using batches of sequences, in order to maximise that approximation. This aims at promoting multistability within the RNN's internal dynamics, and it works for any RNN, including the classical GRU and LSTM networks. Finally, we test this new pretraining method, called the warmup, on both supervised and reinforcement learning benchmarks. RNNs pretrained with the warmup are shown to learn faster and better the long-term dependencies than their non-pretrained counterparts.