Optimization Track. Denis Timonin: Fast training with AMP/TF32 using TensorCores on NVIDIA GPU

Increasing the size of a neural network typically improves accuracy but also increases the memory and compute requirements for training the model. At the same time amount of data is constantly growing (exponentially in the last years). So we will talk about one of the most powerful methodologies to speed-up Training and Inference at the current time. In my presentation, we will dive into details of the research paper “Mixed Precision Training” by NVIDIA and Baidu Research and into detail of TensorFloat32 precision format. We will discuss algorithms that are used in Mixed Precision training and also we will talk about hardware that can provide high speed for that data formats in Neural Networks. I will try to simplify all of this information.

2 views

295