A Block Minifloat Representation for Training Deep Neural Networks
A Block Minifloat Representation for Training Deep Neural Networks
Training Deep Neural Networks (DNN) with high efficiency can be difficult to achieve with native floating point representations and commercially available hardware. Specialized arithmetic with custom acceleration offers perhaps the most promising alternative.Ongoing research is trending towards narrow floating point representations, called minifloats, that pack more operations for a given silicon area and consume less power. In this paper, we introduce Block Minifloat (BM), a new spectrum of minifloat formats capable of training DNNs end-to-end with only 4-8 bit weight, activation and gradient tensors. While standard floating point representations have two degrees of freedom, via the exponent and mantissa, BM exposes the exponent bias as an additional field for optimization. Crucially, this enables training with fewer exponent bits, yielding dense integer-like hardware for fused multiply-add (FMA) operations. For ResNet trained on ImageNet, 6-bit BM achieves almost no degradation in floating point accuracy with FMA units that are $4.1\times(23.9\times)$ smaller and consume $2.3\times(16.1\times)$ less energy than FP8 (FP32). Furthermore, our 8-bit BM format matches floating-point accuracy while delivering a higher computational density and faster expected training times.
训练深度神经网络的块最小浮点表示
使用本地浮点表示和市售硬件很难实现高效地训练深度神经网络(DNN)。具有自定义加速功能的专门算法可能是最有前途的选择。.. 正在进行的研究正在朝着称为最小浮点数的狭窄浮点表示法发展,这种浮点表示法在给定的硅面积上包含更多的操作,并且消耗的功率更少。在本文中,我们介绍了Block Minifloat(BM),这是一种新型的minifloat格式,能够仅用4到8位的权重,激活和梯度张量来端对端训练DNN。标准浮点表示法通过指数和尾数具有两个自由度,而BM则将指数偏差作为优化的附加字段公开。至关重要的是,这使得能够用更少的指数位进行训练,从而产生了密集的类整数硬件,用于融合乘加(FMA)操作。对于在ImageNet上训练的ResNet,使用FMA单元的6位BM几乎不会降低浮点精度。 4.1×(23.9×) 较小且消耗 2.3×(16.1×) 能量比FP8(FP32)少。此外,我们的8位BM格式与浮点精度匹配,同时提供更高的计算密度和更快的预期训练时间。 (阅读更多)