What Do Neural Networks Really Learn? Exploring the Brain of an AI Model
Neural networks have become increasingly impressive in recent years, but there’s a big catch: we don’t really know what they are doing. We give them data and ways to get feedback, and somehow, they learn all kinds of tasks. It would be really useful, especially for safety purposes, to understand what they have learned and how they work after they’ve been trained. The ultimate goal is not only to understand in broad strokes what they’re doing but to precisely reverse engineer the algorithms encoded in their parameters. This is the ambitious goal of mechanistic interpretability. As an introduction to this field, we show how researchers have been able to partly reverse-engineer how InceptionV1, a convolutional neural network, recognizes images.
▀▀▀▀▀▀▀▀▀SOURCES & READINGS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
This topic is truly a rabbit hole. If you want to learn more about this important research and even contribute to it, check out this list of sources about mechanistic interpretability and interpretability in general we’ve compiled for you:
On Interpreting InceptionV1:
Feature visualization:
Zoom in: An Introduction to Circuits:
The Distill journal contains several articles that try to make sense of how exactly InceptionV1 does what it does:
OpenAI’s Microscope tool lets us visualize the neurons and channels of a number of vision models in great detail:
Here’s OpenAI’s Microscope tool pointed on layer Mixed3b in InceptionV1:
Activation atlases:
Transformer Circuits Thread, the spiritual successor of the circuits thread on InceptionV1. This time on transformers:
In the video, we cite “Toy Models of Superposition“:
We also cite “Towards Monosemanticity: Decomposing Language Models With Dictionary Learning“:
More recent progress:
Mapping the Mind of a Large Language Model:
Press:
Paper in the transformers circuits thread:
Extracting Concepts from GPT-4:
Press:
Paper:
Browse features:
Language models can explain neurons in language models (cited in the video):
Press:
Paper:
View neurons:
Neel Nanda on how to get started with Mechanistic Interpretability:
Concrete Steps to Get Started in Transformer Mechanistic Interpretability:
Mechanistic Interpretability Quickstart Guide:
200 Concrete Open Problems in Mechanistic Interpretability:
More work mentioned in the video:
Progress measures for grokking via mechanistic interpretability:
Discovering Latent Knowledge in Language Models Without Supervision:
Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning:
▀▀▀▀▀▀▀▀▀PATREON, MEMBERSHIP, MERCH▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🟠 Patreon:
🔵 Channel membership:
🟢 Merch:
🟤 Ko-fi, for one-time and recurring donations:
▀▀▀▀▀▀▀▀▀SOCIAL & DISCORD▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
Discord:
Reddit:
X/Twitter:
▀▀▀▀▀▀▀▀▀PATRONS & MEMBERS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
AAAA There are too many of you, and you don’t fit in the description this time! But we thank you from the bottom of our hearts. All of you, in this Google Doc:
▀▀▀▀▀▀▀CREDITS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
Here are all the good doggos who worked on this video:
1 view
30
11
1 month ago 00:00:19 2
Charlie’s and Seviathan’s Childhood 🧸#art #animatic #sketch #hazbin #hazbinhotel
1 month ago 00:26:11 1
Beautiful Repentance | World Mission Society Church of God
1 month ago 00:00:19 1
Why Do We Yawn? It’s Not What You Think! #Yawn
1 month ago 00:08:45 1
TXT Answer The Web’s Most Searched Questions | WIRED
1 month ago 00:00:20 2
Do you ever wonder what crossfit jesus would do?
2 months ago 00:03:48 1
OFFICIAL Somewhere over the Rainbow - Israel “IZ“ Kamakawiwoʻole
2 months ago 00:03:39 17
Smokie - What Can I Do (East Berlin )
2 months ago 00:04:31 7
PocketOption promocode: GETALLBONUS100 will give the biggest BONUS!
2 months ago 00:22:12 1
CROWNING OF DONALD TRUMP AS KING OVER THE NEW WORLD GOVERNMENT 12,10,2020
2 months ago 00:35:39 2
Light and Darkness | World Mission Society Church of God
2 months ago 00:02:28 2
Pokemon GO Spoofer iOS & Android - How to Spoof Pokemon GO 2025
2 months ago 00:08:03 1
Парень размазал путина с его ЕДРОсней, зажег сердца людей! ВОТ ОНО – НАСТОЯЩЕЕ МНЕНИЕ НАРОДА!
2 months ago 00:02:27 1
Pokemon GO Joystick, Teleport 2025 - How to Get Pokemon GO Hack iOS & Android
2 months ago 00:05:31 12
The Bunny The Bear - Aisle (Redux) OFFICIAL VIDEO
2 months ago 00:00:39 1
[Kids Poem] Introducing My Family | World Mission Society Church of God
2 months ago 00:08:03 1
Armor-piercing blade made from a sabot projectile.
2 months ago 00:32:19 1
Kursk Frontline Refugees Speak Out: “Ukraine Soldiers Made Life Hell“
2 months ago 00:02:43 1
Pokemon GO Hack iOS & Android - How to Spoof Pokemon GO
2 months ago 00:42:31 4
Secret Door Bookcase made from Pallet Wood and Scrap
2 months ago 00:22:16 1
Why Pride Is the Worst | The Seven Deadly Sins | PRIDE
2 months ago 00:00:19 1
They are so connected, they know exactly what they’re doing😅 #theuntamed #xiaozhan #wangyibo #fyp
2 months ago 00:08:48 1
A Large Group of BRITISH and FRENCH Mercenaries Were Blown to Pieces During Their Escape From KURSK
2 months ago 00:11:37 1
Neuromarketing: How brands are getting your brain to buy more stuff