exploring the world of AI with deep graph—张峥

2020-03-01 154浏览

1.Deep Graph Made Easy (and Faster) Director, AWS Shanghai AI Lab (zhaz@amazon.com) zz@nyu.edu (on leave)
2.
3.Computer science = algorithm + data structures • DL algorithms need a sound data structure • It’s not tensor. • It’s Graph of tensors
4.Why Graph with deep learning? (1) • Something isn’t right here…. A sequence à another sequence A collection of pixels à class label
5.Why Graph with deep learning? (2) • A sentence is more than a sequence • But trees are not doing well, why? • The tasks maybe too simple • Or, we have not got the structure right
6.AvazFreeSpeech:teach autism kids to speak
7.Why Graph with deep learning? (2) • A sentence is at least a graph • Abstract Meaning Representation
8.Why Graph with deep learning? (3) • Image is neither a collection of pixels, nor a collection of bounding boxes Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction
9.Why Graph with deep learning? (3) • Image sequence is evolution of graphs Everybody dance now
10.Why Graph with deep learning? (3) • Image sequence is an evolution of graphs Everybody dance now
11.Why Graph with deep learning? (4) • Reasoning is to distill a graph
12.Why Graph with deep learning? (5) • Learning new knowledge is to distill a graph, too
13.Why Graph with deep learning? (6) • Many data are already graphs Social network Drugs and new materials Custom database Knowledge graph
14.Why Graph with deep learning? (7) log(# Graphs created/second) Small graphs produced every second (e.g. sentences and images) Strong AI Several giant graphs changes every second (e.g. knowledge graph, social networks) log(Graph size)
15.Two sides of the same coin Leverage the graph if it exists Discover the graph otherwise
16.Deep Graph libraryhttps:www.dgl.ai
17.Deep learning + Graph 17
18.Deep Graph Applications Graph Prediction RecSys Chemistry Node Prediction Edge Prediction ✔ ✔ ✔ ✔ ✔ Knowledge Graph NLP ✔ ✔ CV ✔ ✔ Drugs and new materials Graph Generation ✔ ✔ ✔ Social network Custom database Knowledge graph
19.Graph Definition • 𝐺(𝑉, 𝐸) • 𝑉: {𝒗* }, 𝒗* is the node feature vector. • 𝐸: 𝒆- , 𝑠- , 𝑟- , 𝒆- is the feature vector of edge 𝑠- → 𝑟- . 19
20.Deep Graph in anutshell:“Who are you?” • Your DNA • … and your interactions to other DNAs
21.Graph DNNs in message passing paradigm Message function Update function [Gilmer 2017, Wang 2017, Ba`aglia 2018] Reduce function 21
22.Message passing paradigm is very ﬂexible • Graph convolutional network (GCN) [Kipf, 2018] • can be arbitrarily complex • Can be neural networks • LSTM reducer [Hamilton, 2017] • Tensor contraction reducer [Kondor, 2018] 22
23.Messages can have flexible propagation order • Full propagation • Propagation by graph traversal • Topological order on sentence parsing tree • Belief propagation order • Propagation by random walk • Propagation by graph sampling 23
24.ExisUng DL frameworks do not support graph DNNs well • Writing GNNs is hard in TensorFlow/Pytorch/MXNet. (Our implementation) caused OOM … which is the reason for the lack of results on Pubmed. -- from the authors of Graph Attention Network [ICLR’18] 24
25.Existing DL frameworks do not support graph DNNs well Parse Tree [Tai et al. ACL’15] Training speed of one popular Pytorch impl is only 23 trees/s Molecule Graph [Jin et al. ICML’18] Author’s code inPytorch:8.25 graphs/s Knowledge Base [Schlichtkrull et al. arxiv’17] Cannot run on GPU due to OOM 25
26.Gap between message passing and tensor computation h5 v5 h2 v2 𝑚𝑠𝑔:2 = ℎ: v1 : ℎ2345 = 𝑓(8 𝑚𝑠𝑔=2 ) 9 v4 v3 h4 h3 Message passing paradigm deﬁnes ﬁnegrainedcomputamon:•Edge-wise:how to send messages •Node-wise:how to use messages Tensor-based interface iscoarsegrained:• How to transform tensors 26
27.Different scenarios require different supports Batching graphs Sampling Many moderate-sized graphs Single giant graph Mutation Dynamic graph Heterogeneous attributes Heterogeneous graph 27
28.Important meta-goal ofDGL:• Forward and backward compatible •Forward:easy to develop new models •Backward:seamless integration with existing framework (MXNet/Pytorch/Tensorflow) •Fast and Scalable
29.Oursystem:Deep Graph Library (DGL) DGL v.s. other tools for graph DNNs Not only sum/max/min/mean Important for applications on giant graphs More communities; more users 29
30.Programming interface • Graph as the core abstraction • DGLGraph • g.ndata[‘h’] • Simple but versatile message passing APIs AcHve set speciﬁes which nodes/edges to trigger the computamon on. can be user-deﬁned funcmons (UDFs) or built-in symbolic funcmons. 30
31.Writing GNNs is intuitive in DGL • Average Pooling update_all is a shortcut for send(G.edges()) + recv(G.nodes()) 31
32.Writing GNNs is intuitive in DGL • Max pooling? • Not possible in vanilla PyTorch & MXNet. Not memory eﬃcient in Tensorﬂow. Simply use a diﬀerent reduce funcmon. 32
33.Writing GNNs is intuitive in DGL • Graph attention network Specify more complex message/reduce functions. 33
34.Writing GNNs is intuitive in DGL • Graph attention network 34
35.Transfomer is GAT (over a complete graph)Transformer:- 𝑂(𝑛9): data hungry © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Star-Transformer (in NAACL’19) - 𝑂(𝑛): much less data hungry - Leverages ngram prior, but has issues with long dependency SegTree-Transformer (in ICLR’19 RLGM) - 𝑂(𝑛log 𝑛 ): less data hungry - A good compromise in between
36.Supporting Heterogeneous Graph Per-type message passing Type-wise reducmon Modeling Relational Data with Graph Convolutional Networks, M. Schlichtkrull 36
37.Supporting Heterogeneous Graph 37
38.Summary of DGL Programming Interface User-deﬁned funcmons Send, Recv Acmve set Batching APIs Mutation APIs Sampling APIs Support multi-type 38
39.DGL:The history V0.3.1 Development started 2018 V0.2 Sampling APIs V0.5 NN modules DGL-Chem DGL-Rec TF support 2020 2019 V0.3 First prototype V0.1 Fused message passing Distributed training More model zoos More NN modules Faster training … V0.4 Heterogeneous graph DGL-KE 39
40.Evaluation:open source project • Githubstats:• Fully-booked tutorial in KDD h`ps://github.com/dmlc/dgl40
41.EvaluaUon:community endorsement 41
42.Evaluation:efficiency and memory consumptionPyG:pytorch-geometricTestbed:one V100 GPU (16GB)TestBed:AWS EC2 p3.2xlarge instance (V100 GPU) 42
43.EvaluaUon:scalability 7.5x 3.4xPyG:pytorch-geometric Scalability with graph sizeTestbed:one V100 GPU (16GB) Scalability with graph densityTestBed:AWS EC2 p3.2xlarge instance (V100 GPU) 43
44.EvaluaUon:auto-batching 10.6x Compare with DyNet for training TreeLSTMTestbed:one V100 GPU (16GB) 44
45.Deep Graph Applications (enabled by DGL)
46.Graph in Drug Discovery • Toxicity • Number of Aromamc Atoms in Molecules • Propermes for Quantum Chemistry Pushing the Boundaries of Molecular Representation for Drug Discovery with the Graph Attention Mechanism, Journal of Medicinal Chemistry, 2019
47.Graph in Drug Discovery Time/Epoch Reference DGL Speedup 6s 1.2s 5x Pushing the Boundaries of Molecular Representation for Drug Discovery with the Graph Attention Mechanism, Journal of Medicinal Chemistry, 2019
48.Graph in Drug Discovery • DGL Model Zoo • Toxicity Prediction
49.Graph in Drug Discovery • Heterogeneous Graphs •Node:drugs, proteins, diseases •Edge:side-eﬀects, interacmons Zitnik et al., 2018
50.Graph in Drug Discovery • Molecular Graph Generation • Build graph • Learn embedding with VAE • Optimize embedding to get better graph
51.Graph in Knowledge Graph • Relational Graph Convolutional Network
52.Graph in Knowledge Graph • Relational Graph Convolutional Network • Recovery of of missing facts • Entity classification Modeling Relational Data with Graph Convolutional Networks
53.Graph in Knowledge Graph • Network Embedding • DGL performance • 16 CPU cores and one Nvidia V100 GPU. Models TransE DistMult ComplEx RESCAL TransR RotatE MAX_STEPS 20000 100000 100000 30000 100000 100000 TIME 411s 690s 806s 1800s 7627s 4327s • GraphVite (official implementation) • Training TransE with 4 GPUs, in ~14 min (~840s)
54.Graph in Recommender System • Graph Convolutional Matrix Completion Graph Convolutional Matrix Completion
55.Graph in Recommender System Dataset RMSE (DGL) RMSE (Official) Speed (DGL) Speed (Official) Speedup 5x 22x MovieLens-100K 0.9077 0.910 0.0246 s/epoch 0.1008 s/epoch MovieLens-1M 0.8377 0.832 0.0695 s/epoch 1.538 s/epoch MovieLens-10M 0.7875 0.777* 0.6480 s/epoch Long* *Official training on MovieLens-10M has to be in mini-batch, which lasts for over 24+ hours
56.Graph in Natural Language Processing • Summarization Graph-based Neural Multi-Document Summarization
57.Graph in Natural Language Processing • Knowledge Graph to Sentence •Input:a title and knowledge graph •Output:abstract Text Generation from Knowledge Graphs with Graph Transformers
58.Graph in Natural Language Processing • GraphWriter • Graph Attention • Copy or vocab Text Generation from Knowledge Graphs with Graph Transformers
59.Graph in Natural Language Processing DGL Official BLEU 13.32 14.3 Speed 1192 s/epoch 1970 s/epoch Text Generation from Knowledge Graphs with Graph Transformers
60.Open source, the source of innovation
61.AWS Shanghai AI Lab (ASAIL): we are hiring! • Open sourceproject:DGL • Incubate and enable graph-empowered models/services Leverage • Deep graph research Discover
62.
63.
64.Q&A