蘑菇街-图像算法工程师黄文波 - 深度学习在移动端的优化实践
2020-02-27 229浏览
- 1.
- 2.深度学习在移动端的 优化实践 黄文波(鬼谷) 美丽联合集团
- 3.集团简介 美丽联合集团是专注服务女性的时尚消费平台,成立于2016 年 6 月 15 日。美丽联合集 团旗下包括:蘑菇街、美丽说、uni、锐鲨、MOGU STATION等产品与服务。覆盖时尚 消费的各个领域,满足不同年龄层、消费力和审美品位的女性用户日常时尚资讯与时尚 消费所需。
- 4.整体数据 时尚红人 120,000+ 日活用户 10,000,000+ 注册用户数 200,000,000+ 女性用户占比 95%+ 成交规模 ¥20,000,000,000+ 移动用户占比 95%+
- 5.主要内容 背景与现状 模型压缩与设计 移动端实践 总结
- 6.01 背景及现状
- 7.深度学习:从云端到边缘计算
- 8.蘑菇街为什么做深度学习优化? 服务器 • 减少训练、预测的时间 • 节约GPU资源,节约电 移动端 • 实时响应需求 • 本地化运行,减少服务器压力 • 保护用户隐私
- 9.CNN基础
- 10.CNN基础
- 11.Challenge 深度学习:网络越来越深,准确率越来越高 模型越来越大 越多的存储和 计算 耗费越多能量 移动设备:内存有限、计算性能有限、功耗有限
- 12.02 模型压缩与设计
- 13.Model Compression Pruning Quantization Huffman Encoding
- 14.Pruning Weight-Level Pruning for the sparse connections Han et al, “Learning both weights and connections for efficient neural networks”, NIPS 2015
- 15.Pruning Channel-Level Pruning and retraining iteratively Li et al, “Pruning filter for efficient convnets”, ICLR 2017
- 16.Pruning Channel-Level Pruning with L1 regularization Liu et al, “Learning efficient convolutional networks through network slimming”, ICCV 2017
- 17.Quantization Han et al, “DeepCompression:Compressing deep neural networks with pruning, trained quantization and huffman coding”,
- 18.Huffman Encoding Han et al, “DeepCompression:Compressing deep neural networks with pruning, trained quantization and huffman coding”,
- 19.Summary of model compressionPruning:less number of channels channel-level pruning and retraining iteratively channel-level pruning with L1 regularization
- 20.Smaller CNNs architecture design SqueezeNet MobileNet ShuffleNet
- 21.SqueezeNet Input 64 1x1 Conv Squeeze 16 1x1 Conv Expand 3x3 Conv Expand 64 64 Output Concat/Eltwis e 128 Iandola et al, “SqueezeNet:AlexNet-level accuracy with 50x fewer parameters and < 0.5MB model size”, arXiv 2016
- 22.MobileNets Howard et al, “MobileNets:Efficient convolutional neural networks for mobile vision applications”, arXiv
- 23.ShuffleNet Zhang et al, “ShuffleNet:An extremely efficient convolutional neural network for mobile devices”, arXiv 2017
- 24.Our practice Overall Performance of Pruning ResNet50 on ImageNet Model Original Pruned-50 Pruned-Q-50 strategy Pruning Pruning + Quantization Top-1 75% 72.5% 72.4% Top-5 92.27% 90.9% 90.6% Model Size 98M 49M 15M
- 25.Our practice Performance of Pruning ResNet-34 on Our Dataset Model Original Pruned-64 Top-1 48.92% 48.27% Top-5 82.2% 81.5% Inference Time 96ms 45ms (2319 categories, 1200W samples) Model Size 86M 31M
- 26.Our practice ParseNet 18类(基础网络:MobileNet) Model ParseNet mIOU 56% Pixel-LevelAccuracy Model Size 93.5% 13M
- 27.03 移动端工程实践
- 28.移动端服务端分工 Training Inference
- 29.DL frameworks Caffe Caffe2 MXNet Tensorflow Torch …. NCNN、MDL CoreML Tensorflow Lite
- 30.From training to inference Convolution BN Relu Convolution
- 31.优化卷积计算 Direct convolution 25*9 9*1 im2col-based convolution
- 32.优化卷积计算 Cho et al, “MEC:Memory-efficient convolution for deep neural network”,
- 33.浮点运算定点化 Input(float) Min Max Quantize 8 Bit Min Max QuantizedRelu 8 Bit Min Max Dequantize Output(float)
- 34.卷积计算还能怎么进化? 再牛逼的优化算法,都不如硬件实现来得直接 通用卷积 VS 特定卷积
- 35.Android端深度学习框架 NCNN vs MDL FrameWork NCNN MDL 单线程 370ms 360ms 四线程 200ms 190ms 内存 25M 30M Tensorflow Lite Quantize MobileNet 85ms Float Mobilenet 400ms MobileNet on HuaweiP9
- 36.iOS 上的DL CoreM L 可扩展性不强,不适合部署新算法;需要iOS 11+
- 37.MPSCNN 充分利用GPU资源,不用抢占CPU 利用Metal开发新的层很方便 Tips: 半精度计算;权重存储格式为 NHWC
- 38.MPSCNN Slice0 Slice1 Slice2 MPSImage The layout of a 9-channel CNN image with a width of 3 and a height of 2.
- 39.Metal Performance Shader kernel void eltwiseSum_array( texture2d_arrayaccess::sample