1. 首页
  2. 人工智能
  3. 论文/代码
  4. Deep Layers as Stochastic Solvers

Deep Layers as Stochastic Solvers

上传者: 2021-01-24 04:59:39上传 .PDF文件 442.66 KB 热度 17次

Deep Layers as Stochastic Solvers

We provide a novel perspective on the forward pass through a block of layers in a deep network. In particular, we show that a forward pass through a standard dropout layer followed by a linear layer and a non-linear activation is equivalent to optimizing a convex optimization objective with a single iteration of a $\tau$-nice Proximal Stochastic Gradient method.We further show that replacing standard Bernoulli dropout with additive dropout is equivalent to optimizing the same convex objective with a variance-reduced proximal method. By expressing both fully-connected and convolutional layers as special cases of a high-order tensor product, we unify the underlying convex optimization problem in the tensor setting and derive a formula for the Lipschitz constant $L$ used to determine the optimal step size of the above proximal methods. We conduct experiments with standard convolutional networks applied to the CIFAR-10 and CIFAR-100 datasets, and show that replacing a block of layers with multiple iterations of the corresponding solver, with step size set via $L$, consistently improves classification accuracy.

深层作为随机解算器

我们提供了有关通过深度网络中的各个层进行正向传递的新颖观点。特别是,我们表明向前穿过标准辍学层,然后经过线性层和非线性激活等效于通过一次迭代来优化凸优化目标。 τ -nice近邻随机梯度法。.. 我们进一步表明,用加性滤除替换标准的伯努利滤除等效于使用减少方差的近端方法优化相同的凸物镜。通过将全连通层和卷积层都表示为高阶张量积的特例,我们统一了张量设置中的基本凸优化问题,并导出了Lipschitz常数的公式 大号 用于确定上述近端方法的最佳步长。我们对应用于CIFAR-10和CIFAR-100数据集的标准卷积网络进行了实验,结果表明,用相应求解器的多次迭代替换了一层图层,并通过 大号 ,持续提高分类准确性。 (阅读更多)

下载地址
用户评论