nVidia H100 Sets World Record - Trains GPT3 in 11 MINUTES!

H100 GPUs set new records on all eight tests in the latest MLPerf training benchmarks released today, excelling on a new MLPerf test for generative AI. That excellence is delivered both per-accelerator and at-scale in massive servers. For example, on a commercially available cluster of 3,584 H100 GPUs co-developed by startup Inflection AI and operated by CoreWeave, a cloud service provider specializing in GPU-accelerated workloads, the system completed the massive GPT-3-based training benchmark in less than eleven minutes. Let us know what you think! --- So NVIDIA just broke an incredible record and proved again that they have some of the best GPUs for AI in the entire world. To set this benchmark, NVIDIA combined 3,584 H100 GPUs using a cloud provider called CoreWeave, and they tried to see how quickly they could train all of GPT-3, which is an early version of ChatGPT from OpenAI. And we’ll talk about the third-party org that actually benchmarked this in just a bit. But they compared what this massive number of H100 GPUs could do compared to a similar number of NVIDIA A100 80GB cards and V100 GPUs, which were the state-of-the-art basically three years and about six years ago. And what was wild is the H100s were able to train all of GPT-3 in just 46 hours, which is just completely insane. The comparison with the A100s is the A100s took about 36 days. So what’s wild is the H100s, although it’s the same number of physical GPUs actually consuming less power, were almost 20 times faster than the previous state-of-the-art, which is completely insane. The V100s actually took 51 days. the difference here is that even if the GPUs are fast, the industry standard is NVIDIA, the best software is NVIDIA, and the best developers are using that. So if you’re going to go out and pick this, you know, it is a bit of a risk and it is different. And even if the performance is the same, this kind of benchmarking has actually not even been shown to work on AMD yet. So interesting there. The other curious thing with AMD is that now that they’re trying to be sort of more serious about this, it actually strengthens NVIDIA’s position because it makes NVIDIA look like less of a monopoly in terms of their capabilities in the AI space. And what’s curious about this is Moore’s law technically has slown down. However, the important thing is not necessarily compute density or how much energy we’re consuming, but it’s how well it scales. And if you then add on another 100, then 200 H100s will be twice as fast as 100 H100s. And for those of us who remember back when NVIDIA SLI was a thing, you could have up to four NVIDIA has shown, yeah, we can get 3,500 of these together and achieve near linear scaling. So if you have 6,000, you know, the limitation really then lands on how much energy you have, how much space you have in the building, and the breaking points of NVIDIA’s InfiniBand inter ey made huge investments in Mellanox, which was a company that only made networking infrastructure. The other cool thing is this was not trained on an HPC cluster. Another thing that I think is important to mention is the benchmark used here is a benchmark from an independent organization called MLPerf. And what’s interesting is, you know, some companies that focus on bespoke accelerators, so for instance, kind of like what Tesla has done, and there are some other small companies in the re going to focus on relues, or we’re going to focus on some specific industry segment. And it’s interesting, because for those of you who know what ASICs are, which are application-specific integrated circuits, so, you know, for mining, they can really only do one thing ver And technically speaking, an NVIDIA GPU is a kind of ASIC, because it just does parallel computation very well. However, they’re slightly more general, and that’s before you’re looking at something called an FPGA, which is much slower. So NVIDIA has found a balance of these things in their silicon. And what’s really cool is they actually score the best in every category of the MLPerf benchmark. This benchmark specifically is called MLPerf 3.0. And if you go to the ML Commons website, you can actually view the 3.0 results, which are all held by NVIDIA. There is actually a mixture of systems here. So there are the cloud instances. There are also the on-prem instances that are broken down by who actually built the server. So this is also kind of cool if you’re someone who’s making the decision to spend tens of millions or hundreds of millions of dollars on equipment. You can look at what’s the best, and you can even compare it to NVIDIA’s own hardware. They have the DGX-H100s with a number of different CPUs. And NVIDIA did not only destroy the competition in terms of the speed of training GPT-3, which I think is the most relevant here and the most understandable for our audience, but yeah, they nailed it in every single category.
Back to Top