微软亚洲研究院 周明 - 自然语言理解分论坛
2020-02-27 59浏览
- 1.Progress of DNN-Based Natural Language Processing(NLP) Dr. Ming Zhou (mingzhou@microsoft.com) Microsoft Research Asia GAITC NLP Forum May 22, 2017 @ Beijing
- 2.Natural Language Processing(NLP) NLP is a branch of AI, referring to the tech to analyze, understand and generate human language to facilitate human-computer interaction and human-human exchange. NLP Fundamental Word expression and analysis Phrase expression and analysis Sentence expression and analysis Discourse expression and analysis NLP Tech Machine translation Question asking and answering Information retrieval Information extraction NLP Tech Chat and dialogue Knowledge engineering Language generation Recommendation NLP+ Search engine Customer support Business intelligence Speech assistant User profiling Big data Cloud computing ML/DL Knowledge graph
- 3.NLP Evolution • 1940 ~ 1954: Invention of computer and intelligence theory • Leader:Chomsky,Backus,Weaver, Shannon • 1954 ~ 1970:Formal rule system, logic theory and perceptron • Leader:Minsky,Rosenblatt • 1970 ~ 1980:HMM-based ASR, semantic and discourse modelling • Leader:Frederick Jelinek,Martin Kay • 1980 ~ 1991:Rule base and knowledge base • Work:WordNet (1985), HPSG (1987), CYC (1984) • 1991 ~ 2008:statistical machine learning • Approach:SVM, MaxEnt, PCFG, PageRank • Application:SMT, QA and search engine • 2008 ~ 2017:big data and DL • Work:word embedding, NMT, chit-chat, dialogue system, reading comprehension
- 4.Deep Neural Network • Deep Neural Network : • Involve multiple level neural networks • Non-Linear Learner r?????????????????? ℎ??????????????????ℎ ℎ0 = ??????(??????0??????) ℎ2 = ??????(??????2ℎ1) ℎ1 = ??????(??????1ℎ0) ?????? = ??????(??????3ℎ2) ??????????????????ℎ ???????????????????????????????????????????????? Activefunctions:?????? = ??????(??????)
- 5.DNN4NLP Progress • Progress • Word expression with embedding • Sentence modelling via CNN or RNN(LSTM/GRU) for similarity estimation and sequential mapping • Successful applications such as NMT, chatbot, etc. • Still exploring • Learning from unlabeled data (GANs, Dual Learning) • Learning from knowledge • Learning from user/environment (RL) • Discourse and context modelling • Personalized system (via user profiling)
- 6.In this talk, I will focus • NMT • Chatbot • Reading Comprehension
- 7.NMT
- 8.Encoder-Decoder for NMT with RNN ??????=( 近, 几年, 经济, 发展, 变, 慢, 了, . ) Decoder 0.7 0.0 0.2 -0.2 0.9 -0.1 0.5 Encoder ??????=(Economic, growth, has, slowed, down, in, recent, years,.)
- 9.Encoder-Decoder for NMT Sample Recurrent Word State Source State Decoder ??????=( 近, 几年, 经济, 发展, 变, 慢, 了, . ) ???????????? (1) ???????????? (2) Decoder Recurrent Neural Network (3) Encoder ???????????? Encoder Recurrent Neural Network ???????????? ??????=(Economic, growth, has, slowed, down, in, recent, years,.) Sutskever et al., NIPS, 2014 Source Word
- 10.Attention based Encoder-Decoder Sample Recurrent Word State Internal Semantic Decoder Attention Encoder ??????=( 近, 几年, ) ???????????? ???????????? ???????????? ⨀ Attention Weight ⊕ ???????????? ??????=(Economic, growth, has, slowed, down, in, recent, years,.) Left-to-Right Right-to-Left Bahdanau et al., ICLR, 2015 Source Vectors
- 11.Attention based Encoder-Decoder Sample Recurrent Word State Internal Semantic Decoder Attention Encoder ??????=( 近, 几年, 经济, 发展, 变, 慢, 了, . ) ???????????? ???????????? ???????????? ⨀ Attention Weight ⊕ ???????????? ??????=(Economic, growth, has, slowed, down, in, recent, years,.) Bahdanau et al., ICLR, 2015 Source Vectors
- 12.NMT Progress • 4+ BLEU points improvements over SMT • Main stream research (dominating papers in ACL) • Productization (MS, Baidu, Google) 40 39.2 35 33.3 34.1 30 25 31.6 25.1 26.2 20 MT06 MT08 SMT NMT-2015 NMT-2016
- 13.Fusing with Linguistic Knowledge • Decoding from structure to string (tree-tosequence NMT) Tree LSTM Akiko Eriguchi , et al, Tree-to-Sequence Attentional Neural Machine Translation, ACL 2016
- 14.Fusing with Linguistic Knowledge • Decoding from string to structure (sequence-to- tree NMT) Decoder ??????1 ??????2 ??????3 ??????4 … ???????????? ??????0 ??????1 ??????2 ??????3 ??????1 ??????2 ??????3 ??????4 ??????51 ????????????−1 ??????2?????? target tree ??????3 ??????2 ??????5 ??????0 ??????1 ??????2 ??????3 ??????4 ??????2??????−11 ⊕ Attention ??????1 ??????4 ???????????? Encoder ??????1 ??????2 ??????3 … . ??????2?????? ??????1 ??????2 ??????3 … ???????????? Shuangzhi Wu , et al, Tree-to-Sequence Attentional Neural Machine Translation, ACL 2017
- 15.Fusing with Domain Knowledge Sample Recurrent Word Decoder ??????=(I, want, a, white, 4G, cellphone, with, a, big, screen) ???????????? ???????????? ⊕ Target Generation State Internal Semantic KBSE Implicit Internal Semantic ???????????? Explicit InternSpalaScemantic Space ⊕ Source Grounding Encoder ???????????? ??????=(给 我 推荐 个 4G 手机 吧 , 最好 白的 , 屏幕 要 大 。 ) Source Vectors Chen Shi, et al, Knowledge-based semantic embedding for machine translation, ACL 2016
- 16.Remaining Challenges • Use of monolingual data • OOV • Linguistic rules at phrase and sentence levels • Discourse level translation
- 17.Chatbot • IR-based • Generation-based
- 18.An Example of ConversationClerk:'>Clerk: