Let’s pretrain a 3B LLM from scratch: on 16+ H100 GPUs, no detail skipped.
We learn to pretrain a 3B parameter LLM across multiple H100 machines from scratch skipping no details. Learn to handle OOM errors, how to develop on cheap GPUs before scaling to multi-GPU. Finally, we end with running multinode with FSDP and explain how to take the model beyond 3B params.
This is a full lecture with no edits or details skipped. At the end of this lecture you will improve your set of skills and intuition needed for pretraining and scaling LLMs beyond a simple demo.
We start tuning and developing on cheap A10G GPUs. Then we run on 8 H100 GPUs and finally scale it to 2 machines, for a total of 16 H100 GPUs. This workflow saves a ton in cloud costs.
I start at 1B parameters and scale it to 3B. To go beyond 3B, simply use the same process but with more machines.
01:40 Run the Llama template.
02:19 Llama template overview
05:00 Run the template on 1 GPU (A10G)
06:20 Monitor GPU memory usage
06:40 Code walkthrough
10:30 How to handle OOM (out of memory) e
1 view
654
186
5 months ago 01:56:20 1
Let’s build GPT: from scratch, in code, spelled out.
10 months ago 00:05:53 2
I asked AI to make a Music Video... the results are trippy
10 months ago 01:31:01 1
Let’s pretrain a 3B LLM from scratch: on 16+ H100 GPUs, no detail skipped.
1 year ago 00:17:21 1
Actual Objects Presents: Voice To Skull
1 year ago 00:05:12 16
Rico Dutch Shepherd protection pre training for police work
2 years ago 00:06:39 1
PREMIERE: Roman Kyn - High In The Sun (Original Mix) [Sapiens]
2 years ago 00:42:52 1
MLP for CV and NLP problems [in Russian]
3 years ago 01:10:47 1
Предсказания взаимодействий молекул и белков с помощью графовых нейросетей - Илья Сенаторов