How ChatGPT is Trained

This short tutorial explains the training objectives used to develop ChatGPT, the new chatbot language model from OpenAI. Timestamps: 0:00 - Non-intro 0:24 - Training overview 1:33 - Generative pretraining (the raw language model) 4:18 - The alignment problem 6:26 - Supervised fine-tuning 7:19 - Limitations of supervision: distributional shift 8:50 - Reward learning based on preferences 10:39 - Reinforcement learning from human feedback 13:02 - Room for improvement ChatGPT: Relevant papers for learning more: InstructGPT: Ouyang et al., 2022 - GPT-3: Brown et al., 2020 - PaLM: Chowdhery et al., 2022 - Efficient reductions for imitation learning: Ross & Bagnell, 2010 - Deep reinforcement learning from human preferences: Christiano et al., 2017 - Learning to summarize from human feedback: Stiennon et al., 2020 - Scaling laws for reward model overoptimization: Gao et al., 2022 - Proximal policy optimization algorithms: Schulman et al., 2017 - Special thanks to Elmira Amirloo for feedback on this video. Links: YouTube: Twitter: Homepage: If you’d like to help support the channel (completely optional), you can donate a cup of coffee via the following: Venmo: PayPal:
Back to Top