数据驱动与知识引导相结合的人工智能模型与方法(吴飞)

2020-02-27 59浏览

  • 1.2017全国深度学习技术大会 序列学习 数据驱动与知识引导相结合人工智能方法思考 浙江大学计算机学院人工智能研究所 吴飞http://mypage.zju.edu.cn/wufei/http://www.dcd.zju.edu.cn/2017年3月25日
  • 2.提纲  序列学习概念:sequence to sequence(Seq2Seq) learning  序列学习若干方法  知识计算引擎:从大数据到知识
  • 3.The architecture of Seq2Seq Learning v1 v2 v3 v4 Decoder Encoder w1 w2 w3 w4 w5
  • 4.seq2seqlearning:Machine Translation Jordan likes playing basketball Part of Speech Jordan/NNP likes/VBZ playing/VBG basketball/NN Parsing NP Jordan/NNP S likes/VBZ VP S VP playing/VBG NP basketball/NN Semantic Analysis Jordan likes AD V AD playing A1 V basketball A1 乔丹 喜欢 打篮球
  • 5.seq2seqlearning:Machine Translation 乔丹 Decoder 喜欢 打篮球 Decoder Decoder Encoder Encoder Encoder Encoder Jordan likes playing basketball Data-driven learning via amounts of bilingual corpus (the aligned source-target sentences )
  • 6.seq2seqlearning:visual Q-A what is Convolutional Neural Network the man doing ? Encoder Decoder Riding a bike
  • 7.seq2seqlearning:Image-captioningA man in a white helmet is riding a bike Decoder EncoderA man in a white helmet is riding a bike
  • 8.seq2seqlearning:video action classification NO ACTION pitching pitching Decoder Encoder pitching NO ACTION
  • 9.Seq2seqlearning:put it together One Output Many Output One Output One Input Image Classification One Input Image Captioning Sentiment Analysis Many Input
  • 10.Seq2seqlearning:put it together Many Output Many Output Many Input Machine Translation Video Storyline Many Input
  • 11.提纲  序列学习概念:sequence to sequence(Seq2Seq) learning  序列学习若干方法  若干研究与思考
  • 12.Basic models From multilayer perceptron (MLP) to Recurrent Neural Network to LSTM/GRU  Multi-Layer Perceptron(MLP) is by nature a feedforward directed acyclic network .  An MLP consists of multiple layers and can map input data to output data via a set of nonlinear activation functions. MLP utilizes a supervised learning technique called backpropagation for training the network. Input Mapping Output  non-linear  end-to-end  differentiable  sequential
  • 13.Basic models 前向神经网络在刻画数据分布方面的作用: universal approximation theorem A feed-forward network with a single hidden layer containing a finite number of neurons (i.e., a multilayer perceptron), can approximate continuous functions on compact subsets of Rn, under mild assumptions on the activation function. The theorem thus states that simple neural networks can represent a wide variety of interesting functions when given appropriate parameters; however, it does not touch upon the algorithmic learnability of those parameters. One of the first versions of the theorem was proved by George Cybenko in 1989 for sigmoid activation functions.  Balázs Csanád Csáji, Approximation with Artificial Neural Networks, Faculty of Sciences; Eötvös Loránd University, Hungary  Cybenko., G. , Approximations by superpositions of sigmoidal functions, Mathematics of Control, Signals, and Systems, 2 (4), 303-31,1989  Kurt Hornik,Approximation Capabilities of Multilayer Feedforward Networks, Neural Networks, 4(2), 251–257,1991
  • 14.Basic models Backpropagate errors (误差后向传播) Paul J. Werbos (born 1947) is a scientist best known for his 1974 Harvard University Ph.D. thesis, which first described the process of training artificial neural networks through backpropagation of errors. The thesis, and some supplementary information, can be found in his book, The Roots of Backpropagation (ISBN 0471-59897-6). He also was a pioneer of recurrent neural networks. Werbos was one of the original three two-year Presidents of the International Neural Network Society (INNS). He was awarded the IEEE Neural Network Pioneer Award for the discovery of backpropagation and other basic neural network learning frameworks such as Adaptive Dynamic Programming. Paul J. Werbos, Backpropagation ThroughTime:What It Does and How to Do It, Proceedings of the IEEE, 78(10):1550-1560,1990
  • 15.Basic models Backpropagate errors (误差后向传播) Repeatedly adjusts the weights of the connections in the network so as to minimize the measure of the difference between the actual output vector of the net and the desired output vector. As a result of the weight adjustments, internal "hidden" units which are not part of the input or output come to represent important features of the task domain, and the regularities in the task are captured by the interactions of these units. Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J., Learning representations by back-propagating errors, Nature, 323 (6088): 533–536,1986
  • 16.Basic models From multilayer perceptron (MLP) to Recurrent Neural Network to LSTM/GRU Recurrent NeuralNetwork:An RNN has recurrent connections (connections to previous time steps of the same layer).  RNN are powerful but can get extremely complicated. Computations derived from earlier input are fed back into the network, which gives RNN a kind of memory.  Standard RNNs suffer from both exploding and vanishing gradients due to their iterative nature. sequence input (x0…xt) Mapping Embedding vector (ht)
  • 17.Basic models  Long Short-Term Memory (LSTM)Model: LSTM is an RNN devised to deal with exploding and vanishing gradient problems in RNN.  An LSTM hidden layer consists of a set of recurrently connected blocks, known as memory cells.  Each of memory cells is connected by three multiplicative units - the input, output and forget gates.  The input to the cells is multiplied by the activation of the input gate, the output to the net is multiplied by the output gate, and the previous cell values are multiplied by the forget gate. Sepp Hochreiter &Jűrgen Schmidhuber, Long short-term memory, Neural computation, Vol. 9(8), pp. 1735--1780, MIT Press, 1997
  • 18.Basic models Gated Recurrent Unit (GRU) Gated recurrent units are a gating mechanism in recurrent neural networks, GRU has fewer parameters than LSTM, as they lack an output gate. zt =??????(Wzxt+Uzht-1) ??????t = tanh(Wxt+U(rt ht-1)) rt = ??????(Wrxt+Urht-1) ht=(1-zt)ht-1+zt??????t Chung, Junyoung; Gulcehre, Caglar; Cho, KyungHyun; Bengio, Yoshua (2014), Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling.arXiv:1412.355
  • 19.Learning with attention/ internal memory “The behavior of the computer at any moment is determined by the symbols which he is observing and his 'state of mind' at that moment.” – Alan Turing 输出序列 在输出序列中,每一时刻的 输出依赖于所有输入数据当 时时刻的编码 输入序列
  • 20.Learning with attention/ internal memory Neural Machine Translation by Jointly Learning to Align and Translate, ICLR 2015
  • 21.Learning with attention/ internal memory  Google's Neural Machine TranslationSystem:Bridging the Gap between Human and Machine Translation  Google’s Multilingual Neural Machine Translation System Enabling zero-shot translation
  • 22.Learning with attention/ internal memory context vector Zt Deterministic “Soft” Attention via end-to-end learning Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel and Yoshua Bengio, Show, Attend andTell:Neural Image Caption Generation with Visual Attention, ICML 2015
  • 23.Learning with attention/ internal memory Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel and Yoshua Bengio, Show, Attend andTell:Neural Image Caption Generation with Visual Attention, ICML 2015
  • 24.Learning with external memory  人类大脑中的海马体:先验和知识的储存体
  • 25.Learning with external memory  Neural Turing Machines  Reading  Writing Graves A, Wayne G, Danihelka I. Neural Turing Machines,arXiv preprintarXiv:1410.5401,2014, DeepMind J. Weston, S. Chopra, A. Bordes. Memory Networks. ICLR 2015 (andarXiv:1410.3916,Facebook AI)
  • 26.Learning with external memory  Neural Turing Machines
  • 27.Learning with external memory  可微分神经计算机(differentiable neural computer,DNC):  An achievement that has potential implications for the neural–symbolic integration problem(神经网络-符号计算的统一)  Deep neural reasoning and one-shot learning Graves, Alex, et al. "Hybrid computing using a neural network with dynamic external memory." Nature 538.7626 (2016): 471-476.
  • 28.Learning with external memory Deep neural reasoning (连续空间模型与离散空间模型相互协调的搜索与决策)
  • 29.Learning with external memory Learning of Basic Algorithms using Reasoning, Attention, Memory (RAM) Methods include adding stacks and addressable memory toRNNs: “Neural Net Architectures for Temporal Sequence Processing.” M. Mozer.  “Neural Turing Machines” A. Graves, G. Wayne, I. Danihelka.  “Inferring Algorithmic Patterns with Stack Augmented Recurrent Nets.” A. Joulin, T. Mikolov.  “Learning to Transduce with Unbounded Memory” E. Grefenstette et al.  “Neural Programmer-Interpreters” S. Reed, N. de Freitas.  “Reinforcement Learning Turing Machine.” W. Zaremba and I. Sutskever.  “Learning Simple Algorithms from Examples” W. Zaremba, T. Mikolov, A. Joulin, R. Fergus.  “The Neural GPU and the Neural RAM machine” I. Sutskever.
  • 30.Gives ‘memory’ to AI DeepMind crafted an algorithm that lets a neural network 'remember' past knowledge and learn more effectively. The approach is similar to how your own mind works, and might even provide insights into the functioning of human minds. Much like real synapses, which tend to preserve connections between neurons when they've been useful in the past, the algorithm (known as Elastic Weight Consideration) decides how important a given connection is to its associated task James Kirkpatrick, Razvan Pascanu, et al., Overcoming catastrophic forgetting in neural network, PNAS,http://www.pnas.org/cgi/doi/10.1073/pnas.1611835114
  • 31.提纲  序列学习概念:sequence to sequence(Seq2Seq) learning  序列学习若干方法  知识计算引擎:从大数据到知识
  • 32.知识计算引擎: KS-Studiohttp://www.ksstudio.org/
  • 33.知识计算引擎: KS-Studio KS-Studio 技术框架
  • 34.知识计算引擎: KS-Studio 数据驱动机器学习 直觉与经验 文本实体 众包数据中 弱标注信息 在数据驱动机器学习中引入“众包数据”或“知识规则”,拓展 单纯数据驱动的概念识别手段,建立解释性强的人工智能方法。
  • 35.TAC Knowledge Base Population (KBP) 2016
  • 36.TAC Knowledge Base Population (KBP) 2016  Task 1: Cold Start KBP The Cold Start KBP track builds a knowledge base from scratch using a given document collection and a predefined schema for the entities and relations that will comprise the KB. In addition to an end-to-end KB Construction task, Cold Start KBP includes a Slot Filling (SF) task to fill in values for predefined slots (attributes) for a given entity. Person and Organization)
  • 37.TAC Knowledge Base Population (KBP) 2016  Task 2: Entity Discovery and Linking (EDL) The Entity Discovery and Linking (EDL) track aims to extract entity mentions from a source collection of textual documents in multiple languages (English, Chinese, and Spanish), and link them to an existing Knowledge Base (KB); an EDL system is also required to cluster mentions for those entities that don't have corresponding KB entries.
  • 38.TAC Knowledge Base Population (KBP) 2016  Task 3: Event Track The goal of the Event track is to extract information about events such that the information would be suitable as input to a knowledge base. The track includes Event Nugget (EN) tasks to detect and link events, and Event Argument (EA) tasks to extract event arguments and link arguments that belong to the same event.
  • 39.KS-Studio参加KBP知识图谱国际测评  参赛队伍来自CMU、UIUC 、 IBM、UCL 、科大讯飞、浙 江大学、北邮等15个国内外知名高校与研究机构。 参赛队伍信息
  • 40.KS-Studio参加KBP知识图谱国际测评  KBP 2016 实体检测任务(Mention Detection)参赛队伍成绩  浙江大学获得综合排名第一  3个指标中2项第一,1项并列第二
  • 41.KS-Studio参加KBP知识图谱国际测评  KBP 2016 实体链接任务(Entity Linking)参赛队伍成绩  浙江大学关键技术组获得综合排名第一  5个指标中4项第一,1项第二
  • 42.总结:迈向人工智能2.0 Pan, Yunhe, 2016, Heading toward artificial intelligence 2.0, Engineering, 409-413
  • 43.总结:迈向人工智能2.0 Yueting Zhuang, Fei Wu, Chun Chen, Yunhe Pan, Challenges andOpportunities:From Big Data to Knowledge in AI 2.0,Frontiers of Information Technology & Electronic Engineering, 2017,18(1):3-14
  • 44.谢谢大家Email:wufei@cs.zju.edu.cn