Grokking: Generalization beyond Overfitting on small algorithmic datasets (Paper Explained)
#grokking #openai #deeplearning
Grokking is a phenomenon when a neural network suddenly learns a pattern in the dataset and jumps from random chance generalization to perfect generalization very suddenly. This paper demonstrates grokking on small algorithmic datasets where a network has to fill in binary tables. Interestingly, the learned latent spaces show an emergence of the underlying binary operations that the data were created with.
OUTLINE:
0:00 - Intro & Overview
1:40 - The Grokking Phenomenon
3:50 - Related: Double Descent
7:50 - Binary Operations Datasets
11:45 - What quantities influence grokking?
15:40 - Learned Emerging Structure
17:35 - The role of smoothness
21:30 - Simple explanations win
24:30 - Why does weight decay encourage simplicity?
26:40 - Appendix
28:55 - Conclusion & Comments
Paper:
Abstract:
In this paper we propose to study generalization of neural networks on small algorithmically generated datasets. In this settin
16 views
4
5
5 months ago 00:17:35 1
What Do Neural Networks Really Learn? Exploring the Brain of an AI Model
9 months ago 00:28:35 1
¿LLEGARÁ A PENSAR LA INTELIGENCIA ARTIFICIAL? - Vlog de Marc Vidal
3 years ago 00:29:47 16
Grokking: Generalization beyond Overfitting on small algorithmic datasets (Paper Explained)