Residual Distillation: Towards Portable Deep Neural Networks without Shortcuts

上传者：enforcement_59373 2021-01-24 06:26:24上传 .PDF文件 421.64 KB 热度 41次

Residual Distillation: Towards Portable Deep Neural Networks without Shortcuts

By transferring both features and gradients between different layers, shortcut connections explored by ResNets allow us to effectively train very deep neural networks up to hundreds of layers. However, the additional computation costs induced by those shortcuts are often overlooked.For example, during online inference, the shortcuts in ResNet-50 account for about 40 percent of the entire memory usage on feature maps, because the features in the preceding layers cannot be released until the subsequent calculation is completed. In this work,for the first time, we consider training the CNN models with shortcuts and deploying them without. In particular, we propose a novel joint-training framework to train plain CNN by leveraging the gradients of the ResNet counterpart. During forward step, the feature maps of the early stages of plain CNN are passed through later stages of both itself and the ResNet counterpart to calculate the loss. During backpropagation, gradients calculated from a mixture of these two parts are used to update the plainCNN network to solve the gradient vanishing problem. Extensive experiments on ImageNet/CIFAR10/CIFAR100 demonstrate that the plainCNN network without shortcuts generated by our approach can achieve the same level of accuracy as that of the ResNet baseline while achieving about $1.4\times $ speed-up and $1.25\times$ memory reduction. We also verified the feature transferability of our ImageNet pretrained plain-CNN network by fine-tuning it on MIT 67 and Caltech 101. Our results show that the performance of the plain-CNN is slightly higher than that of its baseline ResNet-50 on these two datasets. The codes are in: \href{https://github.com/leoozy/JointRD_Neurips2020}{https://github.com/leoozy/JointRD\_Neurips2020}

残留蒸馏：迈向没有捷径的便携式深度神经网络

通过在不同层之间传递特征和梯度，ResNets探索的快捷连接使我们能够有效地训练非常深的神经网络，直至数百层。但是，这些快捷方式引起的额外计算成本通常被忽略。.. 例如，在在线推理期间，ResNet-50中的快捷方式约占要素图整个内存使用量的40％，因为之前的图层中的要素要等到后续计算完成后才能发布。在这项工作中，我们第一次考虑使用快捷方式训练CNN模型，而无需使用快捷方式进行部署。特别是，我们提出了一种新颖的联合训练框架，以利用ResNet对应项的梯度来训练普通CNN。在前进步骤中，纯CNN早期阶段的特征图将通过自身和ResNet对应项的后期阶段进行计算，以计算损失。在反向传播期间，将从这两个部分的混合物计算得出的梯度用于更新plainCNN网络，以解决梯度消失问题。 1.4× 加速和 1.25× 减少内存。我们还通过在MIT 67和Caltech 101上进行微调，验证了ImageNet预训练的平原CNN网络的特征可传递性。我们的结果表明，在这些方面，平原CNN的性能略高于其基线ResNet-50的性能。两个数据集。代码位于：\ href {https://github.com/leoozy/JointRD_Neurips2020} {https://github.com/leoozy/JointRD\_Neurips2020}

下载地址

用户评论

更多下载

下载地址

 立即下载

用户评论

发表评论

Residual Distillation Towards Portable Deep Neural Networks without Shortcuts

通过在不同层之间传递特征和梯度，ResNets探索的快捷连接使我们能够有效地训练非常深的神经网络，直...

大小：421.64 KB | 2021-01-24 06:26:24

Self Distillation Towards Efficient and Compact Neural Networks

论文速递TPAMI2022 自蒸馏迈向高效紧凑的神经网络在过去的几年里深度神经网络取得了显著的成就....

大小：2.87MB | 2023-01-17 03:24:12

Residual Networks of Residual Networks Multilevel Residual Networks

Residual Networks of Residual Networks: Multilevel...

大小：408KB | 2021-02-20 03:09:10

Training Neural Networks without Gradients

Withthegrowingimportanceoflargenetworkmodelsandeno...

大小：0B | 2020-03-28 09:08:25

Deep Pyramidal Residual Networks.docx

kaiminghe论文: icml2016_tutorial_deep_residual_netwo...

大小：220KB | 2021-04-20 23:28:17

Neural Networks and Deep Learning

ISBN978-3-319-94462-3ISBN978-3-319-94463-0(eBook)h...

大小：0B | 2019-09-22 20:25:23

Approximation Capabilities of Neural ODEs and Invertible Residual Networks

神经ODE和i-ResNet是最近提出的用于增强残差神经模型的可逆性的方法。拥有一种通用的构造可逆模...

大小：2.67 MB | 2021-01-24 04:53:24

Deep Neural Network Training without Multiplications

深度神经网络真的需要乘法吗？在这里，我们建议仅使用整数加法指令代替浮点乘法指令，将两个IEEE754...

大小：187.04 KB | 2021-01-24 06:18:57

Multiple Wavelet Coefficients Fusion in Deep Residual Networks

基于深度残差网络进行故障诊断的PPT简介。 M. Zhao, M. Kang, B. Tang, M...

大小：1.34MB | 2020-07-25 15:52:29

Identity Mappings in Deep Residual Networks.zip

Identity Mappings in Deep Residual Networks.zip

大小：1.02MB | 2021-04-08 11:06:07

Towards Effective Low bitwidth Convolutional Neural Networks

This paper tackles the problem of training a deep ...

大小：518KB | 2021-04-18 02:04:48

Parallel Blockwise Knowledge Distillation for Deep Neural Network Compression

如今，深度神经网络（DNN）在解决自然语言处理，语音识别和计算机视觉中的许多具有挑战性的AI任务方面...

大小：2.21 MB | 2021-01-24 06:19:34

Neural networks and deep learning教程

英文版原版，ByMichaelNielsen/Dec2017。这个教程真的很不错，我自己看了2遍，深...

大小：0B | 2020-02-01 14:27:36

recent developments in deep neural networks

hinton的关于deeplearning的representation

大小：0B | 2019-09-19 11:14:52

neural-networks-and-deep-learning

大小：0B | 2019-02-19 10:43:27

Neural Networks and Deep Learning: A Textbook

大小：0B | 2018-12-08 23:18:37