exploring the world of AI with deep graph—张峥
2020-03-01 153浏览
- 1.Deep Graph Made Easy (and Faster) Director, AWS Shanghai AI Lab (zhaz@amazon.com) zz@nyu.edu (on leave)
- 2.
- 3.Computer science = algorithm + data structures • DL algorithms need a sound data structure • It’s not tensor. • It’s Graph of tensors
- 4.Why Graph with deep learning? (1) • Something isn’t right here…. A sequence à another sequence A collection of pixels à class label
- 5.Why Graph with deep learning? (2) • A sentence is more than a sequence • But trees are not doing well, why? • The tasks maybe too simple • Or, we have not got the structure right
- 6.AvazFreeSpeech:teach autism kids to speak
- 7.Why Graph with deep learning? (2) • A sentence is at least a graph • Abstract Meaning Representation
- 8.Why Graph with deep learning? (3) • Image is neither a collection of pixels, nor a collection of bounding boxes Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction
- 9.Why Graph with deep learning? (3) • Image sequence is evolution of graphs Everybody dance now
- 10.Why Graph with deep learning? (3) • Image sequence is an evolution of graphs Everybody dance now
- 11.Why Graph with deep learning? (4) • Reasoning is to distill a graph
- 12.Why Graph with deep learning? (5) • Learning new knowledge is to distill a graph, too
- 13.Why Graph with deep learning? (6) • Many data are already graphs Social network Drugs and new materials Custom database Knowledge graph
- 14.Why Graph with deep learning? (7) log(# Graphs created/second) Small graphs produced every second (e.g. sentences and images) Strong AI Several giant graphs changes every second (e.g. knowledge graph, social networks) log(Graph size)
- 15.Two sides of the same coin Leverage the graph if it exists Discover the graph otherwise
- 16.Deep Graph libraryhttps:www.dgl.ai
- 17.Deep learning + Graph 17
- 18.Deep Graph Applications Graph Prediction RecSys Chemistry Node Prediction Edge Prediction ✔ ✔ ✔ ✔ ✔ Knowledge Graph NLP ✔ ✔ CV ✔ ✔ Drugs and new materials Graph Generation ✔ ✔ ✔ Social network Custom database Knowledge graph
- 19.Graph Definition • 𝐺(𝑉, 𝐸) • 𝑉: {𝒗* }, 𝒗* is the node feature vector. • 𝐸: 𝒆- , 𝑠- , 𝑟- , 𝒆- is the feature vector of edge 𝑠- → 𝑟- . 19
- 20.Deep Graph in anutshell:“Who are you?” • Your DNA • … and your interactions to other DNAs
- 21.Graph DNNs in message passing paradigm Message function Update function [Gilmer 2017, Wang 2017, Ba`aglia 2018] Reduce function 21
- 22.Message passing paradigm is very flexible • Graph convolutional network (GCN) [Kipf, 2018] • can be arbitrarily complex • Can be neural networks • LSTM reducer [Hamilton, 2017] • Tensor contraction reducer [Kondor, 2018] 22
- 23.Messages can have flexible propagation order • Full propagation • Propagation by graph traversal • Topological order on sentence parsing tree • Belief propagation order • Propagation by random walk • Propagation by graph sampling 23
- 24.ExisUng DL frameworks do not support graph DNNs well • Writing GNNs is hard in TensorFlow/Pytorch/MXNet. (Our implementation) caused OOM … which is the reason for the lack of results on Pubmed. -- from the authors of Graph Attention Network [ICLR’18] 24
- 25.Existing DL frameworks do not support graph DNNs well Parse Tree [Tai et al. ACL’15] Training speed of one popular Pytorch impl is only 23 trees/s Molecule Graph [Jin et al. ICML’18] Author’s code inPytorch:8.25 graphs/s Knowledge Base [Schlichtkrull et al. arxiv’17] Cannot run on GPU due to OOM 25
- 26.Gap between message passing and tensor computation h5 v5 h2 v2 𝑚𝑠𝑔:2 = ℎ: v1 : ℎ2345 = 𝑓(8 𝑚𝑠𝑔=2 ) 9 v4 v3 h4 h3 Message passing paradigm defines finegrainedcomputamon:•Edge-wise:how to send messages •Node-wise:how to use messages Tensor-based interface iscoarsegrained:• How to transform tensors 26
- 27.Different scenarios require different supports Batching graphs Sampling Many moderate-sized graphs Single giant graph Mutation Dynamic graph Heterogeneous attributes Heterogeneous graph 27
- 28.Important meta-goal ofDGL:• Forward and backward compatible •Forward:easy to develop new models •Backward:seamless integration with existing framework (MXNet/Pytorch/Tensorflow) •Fast and Scalable
- 29.Oursystem:Deep Graph Library (DGL) DGL v.s. other tools for graph DNNs Not only sum/max/min/mean Important for applications on giant graphs More communities; more users 29
- 30.Programming interface • Graph as the core abstraction • DGLGraph • g.ndata[‘h’] • Simple but versatile message passing APIs AcHve set specifies which nodes/edges to trigger the computamon on. can be user-defined funcmons (UDFs) or built-in symbolic funcmons. 30
- 31.Writing GNNs is intuitive in DGL • Average Pooling update_all is a shortcut for send(G.edges()) + recv(G.nodes()) 31
- 32.Writing GNNs is intuitive in DGL • Max pooling? • Not possible in vanilla PyTorch & MXNet. Not memory efficient in Tensorflow. Simply use a different reduce funcmon. 32
- 33.Writing GNNs is intuitive in DGL • Graph attention network Specify more complex message/reduce functions. 33
- 34.Writing GNNs is intuitive in DGL • Graph attention network 34
- 35.Transfomer is GAT (over a complete graph)Transformer:- 𝑂(𝑛9): data hungry © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Star-Transformer (in NAACL’19) - 𝑂(𝑛): much less data hungry - Leverages ngram prior, but has issues with long dependency SegTree-Transformer (in ICLR’19 RLGM) - 𝑂(𝑛log 𝑛 ): less data hungry - A good compromise in between
- 36.Supporting Heterogeneous Graph Per-type message passing Type-wise reducmon Modeling Relational Data with Graph Convolutional Networks, M. Schlichtkrull 36
- 37.Supporting Heterogeneous Graph 37
- 38.Summary of DGL Programming Interface User-defined funcmons Send, Recv Acmve set Batching APIs Mutation APIs Sampling APIs Support multi-type 38
- 39.DGL:The history V0.3.1 Development started 2018 V0.2 Sampling APIs V0.5 NN modules DGL-Chem DGL-Rec TF support 2020 2019 V0.3 First prototype V0.1 Fused message passing Distributed training More model zoos More NN modules Faster training … V0.4 Heterogeneous graph DGL-KE 39
- 40.Evaluation:open source project • Githubstats:• Fully-booked tutorial in KDD h`ps://github.com/dmlc/dgl40
- 41.EvaluaUon:community endorsement 41
- 42.Evaluation:efficiency and memory consumptionPyG:pytorch-geometricTestbed:one V100 GPU (16GB)TestBed:AWS EC2 p3.2xlarge instance (V100 GPU) 42
- 43.EvaluaUon:scalability 7.5x 3.4xPyG:pytorch-geometric Scalability with graph sizeTestbed:one V100 GPU (16GB) Scalability with graph densityTestBed:AWS EC2 p3.2xlarge instance (V100 GPU) 43
- 44.EvaluaUon:auto-batching 10.6x Compare with DyNet for training TreeLSTMTestbed:one V100 GPU (16GB) 44
- 45.Deep Graph Applications (enabled by DGL)
- 46.Graph in Drug Discovery • Toxicity • Number of Aromamc Atoms in Molecules • Propermes for Quantum Chemistry Pushing the Boundaries of Molecular Representation for Drug Discovery with the Graph Attention Mechanism, Journal of Medicinal Chemistry, 2019
- 47.Graph in Drug Discovery Time/Epoch Reference DGL Speedup 6s 1.2s 5x Pushing the Boundaries of Molecular Representation for Drug Discovery with the Graph Attention Mechanism, Journal of Medicinal Chemistry, 2019
- 48.Graph in Drug Discovery • DGL Model Zoo • Toxicity Prediction
- 49.Graph in Drug Discovery • Heterogeneous Graphs •Node:drugs, proteins, diseases •Edge:side-effects, interacmons Zitnik et al., 2018
- 50.Graph in Drug Discovery • Molecular Graph Generation • Build graph • Learn embedding with VAE • Optimize embedding to get better graph
- 51.Graph in Knowledge Graph • Relational Graph Convolutional Network
- 52.Graph in Knowledge Graph • Relational Graph Convolutional Network • Recovery of of missing facts • Entity classification Modeling Relational Data with Graph Convolutional Networks
- 53.Graph in Knowledge Graph • Network Embedding • DGL performance • 16 CPU cores and one Nvidia V100 GPU. Models TransE DistMult ComplEx RESCAL TransR RotatE MAX_STEPS 20000 100000 100000 30000 100000 100000 TIME 411s 690s 806s 1800s 7627s 4327s • GraphVite (official implementation) • Training TransE with 4 GPUs, in ~14 min (~840s)
- 54.Graph in Recommender System • Graph Convolutional Matrix Completion Graph Convolutional Matrix Completion
- 55.Graph in Recommender System Dataset RMSE (DGL) RMSE (Official) Speed (DGL) Speed (Official) Speedup 5x 22x MovieLens-100K 0.9077 0.910 0.0246 s/epoch 0.1008 s/epoch MovieLens-1M 0.8377 0.832 0.0695 s/epoch 1.538 s/epoch MovieLens-10M 0.7875 0.777* 0.6480 s/epoch Long* *Official training on MovieLens-10M has to be in mini-batch, which lasts for over 24+ hours
- 56.Graph in Natural Language Processing • Summarization Graph-based Neural Multi-Document Summarization
- 57.Graph in Natural Language Processing • Knowledge Graph to Sentence •Input:a title and knowledge graph •Output:abstract Text Generation from Knowledge Graphs with Graph Transformers
- 58.Graph in Natural Language Processing • GraphWriter • Graph Attention • Copy or vocab Text Generation from Knowledge Graphs with Graph Transformers
- 59.Graph in Natural Language Processing DGL Official BLEU 13.32 14.3 Speed 1192 s/epoch 1970 s/epoch Text Generation from Knowledge Graphs with Graph Transformers
- 60.Open source, the source of innovation
- 61.AWS Shanghai AI Lab (ASAIL): we are hiring! • Open sourceproject:DGL • Incubate and enable graph-empowered models/services Leverage • Deep graph research Discover
- 62.
- 63.
- 64.Q&A