Attention for Neural Networks, Clearly Explained!!!

Attention is one of the most important concepts behind Transformers and Large Language Models, like ChatGPT. However, it’s not that complicated. In this StatQuest, we add Attention to a basic Sequence-to-Sequence (Seq2Seq or Encoder-Decoder) model and walk through how it works and is calculated, one step at a time. BAM!!! If you’d like to support StatQuest, please consider... Patreon: ...or... YouTube Membership: ...buying my book, a study guide, a t-shirt or hoodie, or a song from the StatQuest store... ...or just donating to StatQuest! Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter: 0:00 Awesome song and introduction 3:14 The Main Idea of Attention 5:34 A worked out example of Attention 10:18 The Dot Product Similarity 11:52 Using similarity scores to calculate Attention values 13:27 Using Attention values to predict an output word 14:22 Summary of Attention #StatQuest #neuralnetwork #attention
Back to Top