The Future of Artificial Intelligence
210 views | +0 today
Follow
Your new post is loading...
Your new post is loading...
Scooped by Juliette Decugis
Scoop.it!

[2303.06349] Resurrecting Recurrent Neural Networks for Long Sequences - Orvieto et al. (ETH & DeepMind)

[2303.06349] Resurrecting Recurrent Neural Networks for Long Sequences - Orvieto et al. (ETH & DeepMind) | The Future of Artificial Intelligence | Scoop.it

"Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are hard to optimize and slow to train. Deep state-space models (SSMs) have recently been shown to perform remarkably well on long sequence modeling tasks, and have the added benefits of fast parallelizable training and RNN-like fast inference. [...] Our results provide new insights on the origins of the impressive performance of deep SSMs, while also introducing an RNN block called the Linear Recurrent Unit that matches both their performance on the Long Range Arena benchmark and their computational efficiency."

Juliette Decugis's insight:

All the recent generative AI breakthroughs rely on transformers: OpenAI's GPTs, Meta's Llama, Mistral, Google's Gemini... however they are far from optimal due to their quadratic scaling with sequence length resulting in costly training. In contrast, RNNs scale linearly with sequence length but suffer from exploding and vanishing gradient problems. SSMs on the other hand have shown a lot of potential for modeling long-range dependencies. Orvieto et al., show the performance and efficiency of deep continuous-time SSMs can be matched with a simpler architecture: deep linear RNNs. Using linear hidden state transitions, complex diagonal recurrent matrices and stable exponential re-parametrization, the authors successfully address the failures of RNNs on long sequences. They match S4 results on Path Finder (sequence length = 1024) and Path Finder X (sequence length = 16k). Their reliability on linear recurrent units allows for parallelizable and much faster training.

 

Overall this paper motivates the search for RNN-based architectures and challenges the Transformer supremacy. SSM models are also emerging as an alternative.

No comment yet.
Scooped by Juliette Decugis
Scoop.it!

Enhancing Backpropagation via Local Loss Optimization

Enhancing Backpropagation via Local Loss Optimization | The Future of Artificial Intelligence | Scoop.it

"Posted by Ehsan Amid, Research Scientist, and Rohan Anil, Principal Engineer, Google Research, Brain Team"

Juliette Decugis's insight:
Many recent ML papers hope to address the overwhelming problem of deep learning models: their computational and memory cost. Whereas lots of recent work has focused on the sparsification of said models, LocoProp attempts to rethink backpropagation - the most expensive step of neural network training.

LocoProp decomposes a model's objective function into a layer-wise loss, comparing a layer's output and the overall bath's final output, accompanied by a regularizer term (L2 loss). Breaking down the loss function across layers permits parallelization of training, smaller order calculations and more flexibility. Furthermore, the paper demonstrates that "the overall behavior of the combined updates closely resembles higher-order updates."

Potential limits of the paper: "small" networks used, "still remains to be seen how well the method generally works across tasks."

No comment yet.