ArchSummit深圳2018 《深度学习在信息融合和欺诈风险识别中的应用》 李超
2020-02-27 58浏览
- 1.深度学习在信息融合和欺诈风险识别中的应用 李超 腾讯云·天御
- 2.腾讯云业务安全中心 • 核心业务: • 金融反欺诈 • 防刷 • 内容安全 •… • Aboutme:USTC • Bachelor confidential material from Tencent Cloud NUS • PhD • WE graduatefrom:• NUS • University of South Carolina • Georgia Tech • University of Washington • Imperial College London • University of Warwick •… Mozat • Head of Data Intelligence PropertyG uru • Head of Data Science Tencent Cloud business security center •Lead Data Scientist
- 3.安全常识 confidential material from Tencent Cloud
- 4.1. 中国金融科技浪潮
- 5.Journey to the “普惠金融” 36% 2017: Paipai Dai 2017H1 net profit exceeds 1 billion 2015: monthly P2P lending exceeds 100 Billion 2010: Consumer finance emerged 2017Nov.:《 关于立即暂停批设网络小额贷款公司的通知 》 2017Dec.:《关于规范整顿“现金贷”业务的通知》 EIR must not be higher Specific Purpose The money must be used for 2018 May. : Huaxia launch 龙商贷 2018 Apr. : DiDi launch 滴水贷 2018: Banks and internet company actively enter the online lending market confidential material from Tencent Cloud
- 6.2. 黑产
- 7.互联网金融的发展催生了黑产的繁荣 “医美骗贷大狂欢,中介医院勾结撸出15个亿” 1.5 Billion from collusion 银行卡盗刷行为猖獗 confidential material from Tencent Cloud 该网贷产品日均申请量超过10万件,日均放款额超过一亿元, 如果没有风控,日均损失金额将超过2000万
- 8.黑产-数据交易 confidential material from Tencent Cloud
- 9.黑产-养号机器人 confidential material from Tencent Cloud
- 10.黑产-扫村 confidential material from Tencent Cloud
- 11.3. 风控算法的演进
- 12.策略系统 confidential material from Tencent Cloud • Pros • Definite • Fast deployment • No math • Cons • Relay on experience • Difficult to maintain
- 13.评分卡 confidential material from Tencent Cloud Logistic Regression LR is widely used because • Interpretable • Good generalization • Fast • Simple
- 14.逻辑回归的高级玩法 • Feature explosion • Feature crossing • Ensemble model • Feature evaluation • IV = " ??????$% − ??????'% % ∗ ln ??????$% = " ??????'% % ??????% − ??????% ??????- ??????- ln ??????%/????????????% /??????- • Feature selection • Avoid multi-collinearity confidential material from Tencent Cloud
- 15.One-hot encoding • [“from Europe”, “from US”, “from Asia”] • -> [0,1,2] ? • -> [001,010,100] • [“male”, “female”] • -> [01,10] • [“uses Firefox”, “uses Chrome”, “uses Safari”, “uses Internet Explorer”] • -> [0001,0010,0100,1000] • [“male”,“from Asia”, “uses Chrome”] • -> [01, 100, 0010,] • [ user id ] -> ? confidential material from Tencent Cloud
- 16.XGBoost-简单好用性能优source:www.kdnuggets.com confidential material from Tencent Cloud • The "go-to” choice for machine learning projects • Ensemble model • Robust • Good balance between accuracy and generalization
- 17.4. Thinking in Deep learning
- 18.“好的架构是包容所有好技术的重要前提“ VS. confidential material from Tencent Cloud
- 19.每个神经元都是一个LR confidential material from Tencent Cloud
- 20.PCA降维=Autoencoder confidential material from Tencent Cloud Encoder Decoder Loss= 1 2 ∑2%71 ??????% − ???5???% 6
- 21.编码器和查找表 ??????9 label chrome firefox safari ie embedded 100 0.5 1 0 0 0.5 1 0 1 0.5 confidential material from Tencent Cloud ??????9(??????<) = " ??????%9??????% , % = ??????<9, ??????ℎ?????????????????? ??????< ≠ 0
- 22.Collaborative filtering=浅层神经网络 User confidential material from Tencent Cloud 求和 点乘 Lookup Embedding Item One hot P ??????G ??????' = " ??????%'??????% = " ??????%'??????G% % % Item ??????%' ??????% ??????G% User
- 23.集成学习=Dropout Each person has 2/3 chance to predict the game result correctly. What if they vote? ??????J6 ∗ 26 1 ∗+ 33 2 J 20 2 => 3 27 3 ? confidential material from Tencent Cloud
- 24.风险预测流程 confidential material from Tencent Cloud Understand how the underground industry works Collect all the data that might be related Label your data Design network and train Monitoring the performance and update the model
- 25.混合神经网络 •略 confidential material from Tencent Cloud
- 26.Q&A Thanks!