Article by David Yastremsky detailing the 2019 Lottery Ticket Hypothesis by Frankle and Carbin’s, a motivating paper for deep learning network sparsification.
Get Started for FREE
Sign up with Facebook Sign up with X
I don't have a Facebook or a X account
Your new post is loading...
Your new post is loading...
|
The "lottery ticket hypothesis" shows empirical evidence that the performance of large deep learning models can be reproduced by smaller sub-networks within their architecture. Network pruning is therefore not only possible but can exist at the beginning of training in rare "lottery ticket" cases. Finding these sub-networks could help speed up training, reduce model size and overall help us understand deep learning better.
However, we still haven't found the loophole to train small efficient models from scratch. One key approach for model compression today relies on parent-teacher network where a smaller model learns from the weights of a pre-trained larger model. Many research startups and labs are in the race for scalable generative AI such as Mistral AI and Mosaic ML (recently acquired by Databricks), led by no other than Jonathan Frankle himself.