蘑菇街-图像算法工程师黄文波 - 深度学习在移动端的优化实践

2020-02-27 229浏览

  • 1.
  • 2.深度学习在移动端的 优化实践 黄文波(鬼谷) 美丽联合集团
  • 3.集团简介 美丽联合集团是专注服务女性的时尚消费平台,成立于2016 年 6 月 15 日。美丽联合集 团旗下包括:蘑菇街、美丽说、uni、锐鲨、MOGU STATION等产品与服务。覆盖时尚 消费的各个领域,满足不同年龄层、消费力和审美品位的女性用户日常时尚资讯与时尚 消费所需。
  • 4.整体数据 时尚红人 120,000+ 日活用户 10,000,000+ 注册用户数 200,000,000+ 女性用户占比 95%+ 成交规模 ¥20,000,000,000+ 移动用户占比 95%+
  • 5.主要内容 背景与现状 模型压缩与设计 移动端实践 总结
  • 6.01 背景及现状
  • 7.深度学习:从云端到边缘计算
  • 8.蘑菇街为什么做深度学习优化? 服务器 • 减少训练、预测的时间 • 节约GPU资源,节约电 移动端 • 实时响应需求 • 本地化运行,减少服务器压力 • 保护用户隐私
  • 9.CNN基础
  • 10.CNN基础
  • 11.Challenge 深度学习:网络越来越深,准确率越来越高 模型越来越大 越多的存储和 计算 耗费越多能量 移动设备:内存有限、计算性能有限、功耗有限
  • 12.02 模型压缩与设计
  • 13.Model Compression Pruning Quantization Huffman Encoding
  • 14.Pruning Weight-Level Pruning for the sparse connections Han et al, “Learning both weights and connections for efficient neural networks”, NIPS 2015
  • 15.Pruning Channel-Level Pruning and retraining iteratively Li et al, “Pruning filter for efficient convnets”, ICLR 2017
  • 16.Pruning Channel-Level Pruning with L1 regularization Liu et al, “Learning efficient convolutional networks through network slimming”, ICCV 2017
  • 17.Quantization Han et al, “DeepCompression:Compressing deep neural networks with pruning, trained quantization and huffman coding”,
  • 18.Huffman Encoding Han et al, “DeepCompression:Compressing deep neural networks with pruning, trained quantization and huffman coding”,
  • 19.Summary of model compressionPruning:less number of channels channel-level pruning and retraining iteratively channel-level pruning with L1 regularization
  • 20.Smaller CNNs architecture design SqueezeNet MobileNet ShuffleNet
  • 21.SqueezeNet Input 64 1x1 Conv Squeeze 16 1x1 Conv Expand 3x3 Conv Expand 64 64 Output Concat/Eltwis e 128 Iandola et al, “SqueezeNet:AlexNet-level accuracy with 50x fewer parameters and < 0.5MB model size”, arXiv 2016
  • 22.MobileNets Howard et al, “MobileNets:Efficient convolutional neural networks for mobile vision applications”, arXiv
  • 23.ShuffleNet Zhang et al, “ShuffleNet:An extremely efficient convolutional neural network for mobile devices”, arXiv 2017
  • 24.Our practice Overall Performance of Pruning ResNet50 on ImageNet Model Original Pruned-50 Pruned-Q-50 strategy Pruning Pruning + Quantization Top-1 75% 72.5% 72.4% Top-5 92.27% 90.9% 90.6% Model Size 98M 49M 15M
  • 25.Our practice Performance of Pruning ResNet-34 on Our Dataset Model Original Pruned-64 Top-1 48.92% 48.27% Top-5 82.2% 81.5% Inference Time 96ms 45ms (2319 categories, 1200W samples) Model Size 86M 31M
  • 26.Our practice ParseNet 18类(基础网络:MobileNet) Model ParseNet mIOU 56% Pixel-LevelAccuracy Model Size 93.5% 13M
  • 27.03 移动端工程实践
  • 28.移动端服务端分工 Training Inference
  • 29.DL frameworks Caffe Caffe2 MXNet Tensorflow Torch …. NCNN、MDL CoreML Tensorflow Lite
  • 30.From training to inference Convolution BN Relu Convolution
  • 31.优化卷积计算 Direct convolution 25*9 9*1 im2col-based convolution
  • 32.优化卷积计算 Cho et al, “MEC:Memory-efficient convolution for deep neural network”,
  • 33.浮点运算定点化 Input(float) Min Max Quantize 8 Bit Min Max QuantizedRelu 8 Bit Min Max Dequantize Output(float)
  • 34.卷积计算还能怎么进化? 再牛逼的优化算法,都不如硬件实现来得直接 通用卷积 VS 特定卷积
  • 35.Android端深度学习框架 NCNN vs MDL FrameWork NCNN MDL 单线程 370ms 360ms 四线程 200ms 190ms 内存 25M 30M Tensorflow Lite Quantize MobileNet 85ms Float Mobilenet 400ms MobileNet on HuaweiP9
  • 36.iOS 上的DL CoreM L 可扩展性不强,不适合部署新算法;需要iOS 11+
  • 37.MPSCNN 充分利用GPU资源,不用抢占CPU 利用Metal开发新的层很方便 Tips: 半精度计算;权重存储格式为 NHWC
  • 38.MPSCNN Slice0 Slice1 Slice2 MPSImage The layout of a 9-channel CNN image with a width of 3 and a height of 2.
  • 39.Metal Performance Shader kernel void eltwiseSum_array( texture2d_arrayaccess::sample