ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning (Paper Explained)

#ext5 #transferlearning #exmix The T5 model has been a staple for NLP research for the last years. Both its size and its approach to formulate all NLP tasks as prompt-based language modeling make it a convenient choice to tackle new challenges and provides a strong baseline for most current datasets. ExT5 pushes T5 to its limits by pre-training not only on self-supervised mask filling, but also at the same time on 107 different supervised NLP tasks, which is their new ExMix dataset. The resulting model compares very favorably to T5 when fine-tuned to downstream tasks. OUTLINE: 0:00 - Intro & Overview 2:15 - Recap: The T5 model 3:55 - The ExT5 model and task formulations 8:10 - ExMix dataset 9:35 - Do different tasks help each other? 16:50 - Which tasks should we include? 20:30 - Pre-Training vs Pre-Finetuning 23:00 - A few hypotheses about what’s going on 27:20 - How much self-supervised data to use? 34:15 - More experimental results 38:40 - Conclusion & Summary Paper: Abs
Back to Top