1. 首页
  2. 人工智能
  3. 论文/代码
  4. Layer-wise Conditioning Analysis in Exploring the Learning Dynamics of DNNs

Layer-wise Conditioning Analysis in Exploring the Learning Dynamics of DNNs

上传者: 2021-01-24 04:34:20上传 .PDF文件 5.27 MB 热度 21次

Layer-wise Conditioning Analysis in Exploring the Learning Dynamics of DNNs

Conditioning analysis uncovers the landscape of an optimization objective by exploring the spectrum of its curvature matrix. This has been well explored theoretically for linear models.We extend this analysis to deep neural networks (DNNs) in order to investigate their learning dynamics. To this end, we propose layer-wise conditioning analysis, which explores the optimization landscape with respect to each layer independently. Such an analysis is theoretically supported under mild assumptions that approximately hold in practice. Based on our analysis, we show that batch normalization (BN) can stabilize the training, but sometimes result in the false impression of a local minimum, which has detrimental effects on the learning. Besides, we experimentally observe that BN can improve the layer-wise conditioning of the optimization problem. Finally, we find that the last linear layer of a very deep residual network displays ill-conditioned behavior. We solve this problem by only adding one BN layer before the last linear layer, which achieves improved performance over the original and pre-activation residual networks.

探索DNN学习动态的分层条件分析

条件分析通过探索曲率矩阵的频谱来发现优化目标的前景。理论上已经对线性模型进行了很好的探索。.. 我们将此分析扩展到深度神经网络(DNN),以调查其学习动态。为此,我们提出了分层的条件分析,该分析独立地探索了关于每个层的优化环境。这种分析在理论上是根据在实践中大致成立的假设进行支持的。根据我们的分析,我们表明批量归一化(BN)可以稳定训练,但有时会导致局部最小值的错误印象,这会对学习产生不利影响。此外,我们通过实验观察到,BN可以改善优化问题的分层条件。最后,我们发现一个非常深的残差网络的最后一个线性层显示了不良状态的行为。我们通过在最后一个线性层之前仅添加一个BN层来解决此问题, (阅读更多)

下载地址
用户评论