Join Free

Research and publish the best content.

The Future of Artificial Intelligence

210 views | +0 today

Tags
Current selected tag: 'DeepMind'. Clear

AI 1

deeplearning 1

DeepMind 2

Google Research 1

GoogleAI 1

LRU 1

ML 1

optimization 1

Path Finder 1

RNN 1

S4 1

SSM 1

training 1

Transformers 1

The Future of Artificial Intelligence

Curated by Juliette Decugis

Your new post is loading...

Scooped by Juliette Decugis

Scoop.it!

[2303.06349] Resurrecting Recurrent Neural Networks for Long Sequences - Orvieto et al. (ETH & DeepMind) | The Future of Artificial Intelligence | Scoop.it

From arxiv.org - February 6, 5:48 PM

Juliette Decugis's insight:

All the recent generative AI breakthroughs rely on transformers: OpenAI's GPTs, Meta's Llama, Mistral, Google's Gemini... however they are far from optimal due to their quadratic scaling with sequence length resulting in costly training. In contrast, RNNs scale linearly with sequence length but suffer from exploding and vanishing gradient problems. SSMs on the other hand have shown a lot of potential for modeling long-range dependencies. Orvieto et al., show the performance and efficiency of deep continuous-time SSMs can be matched with a simpler architecture: deep linear RNNs. Using linear hidden state transitions, complex diagonal recurrent matrices and stable exponential re-parametrization, the authors successfully address the failures of RNNs on long sequences. They match S4 results on Path Finder (sequence length = 1024) and Path Finder X (sequence length = 16k). Their reliability on linear recurrent units allows for parallelizable and much faster training.

Overall this paper motivates the search for RNN-based architectures and challenges the Transformer supremacy. SSM models are also emerging as an alternative.

No comment yet.

Scooped by Juliette Decugis

Scoop.it!

From ai.googleblog.com - September 2, 2022 12:57 AM

Juliette Decugis's insight:

Many recent ML papers hope to address the overwhelming problem of deep learning models: their computational and memory cost. Whereas lots of recent work has focused on the sparsification of said models, LocoProp attempts to rethink backpropagation - the most expensive step of neural network training.

LocoProp decomposes a model's objective function into a layer-wise loss, comparing a layer's output and the overall bath's final output, accompanied by a regularizer term (L2 loss). Breaking down the loss function across layers permits parallelization of training, smaller order calculations and more flexibility. Furthermore, the paper demonstrates that "the overall behavior of the combined updates closely resembles higher-order updates."

Potential limits of the paper: "small" networks used, "still remains to be seen how well the method generally works across tasks."

Link to the complete paper: https://proceedings.mlr.press/v151/amid22a/amid22a.pdf

No comment yet.

[2303.06349] Resurrecting Recurrent Neural Networks for Long Sequences - Orvieto et al. (ETH & DeepMind)

Enhancing Backpropagation via Local Loss Optimization