Cross lingual KG Building 2017 张鹏
2020-03-01 134浏览
- 1.Cross-lingual Knowledge Graph Building Peng Zhang Knowledge Engineering Group Department of Computer Science Tsinghua University
- 2.Outline n Present Situation and Motivation n Key Technologies for Cross-lingual Knowledge Graph Building 2 n XLORE - Bi-lingual Knowledge Graph n Future Work
- 3.Current Knowledge Graphs n Knowledgegraph:entities and relationships NELL CN-DBPedia OpenIE (Reverb, OLLIE) 3 …… ZHiSHI.ME
- 4.Motivation l Multilingual Entities and Relationships Enhance the linked data internationalization and the globalization of knowledge sharing among different languages on the Web Facilitate the cross-lingual language processing such as crosslingual information retrieval, machine translation and Q&A etc. 4
- 5.Motivation l Problems Ø Ø Ø Imbalance of knowledge in different languages ü Statistics from Wikipedia Low integration of multilingual knowledge ü Only less that 5% English articles have inter language links in Wikipedia to Chinese articles Lack of research on the knowledge complement of different languages ü Making use of the knowledge in other languages A balanced , integrated multilingual knowledge graph is essential for intelligent multilingual information processing and services 5
- 6.Our Objectives Microsoft Concept Graph • • • • 6 Cross-lingual knowledge linking Cross-lingual knowledge extraction Cross-lingual taxonomy derivation Cross-lingual knowledge graph
- 7.Outline n Present Situation and Motivation n Key Technologies for Cross-lingual Knowledge Graph Building 7 n XLORE - Bi-lingual Knowledge Graph n Future Work
- 8.Cross-lingual Knowledge Linking l Lots of such links are missing Ø l Only 716,452 cross-lingual links between Chinese and English Difficulty in mining cross-lingual links inside Wikipedia Ø The great imbalance problem between multi-lingual Wikipedias It is infeasible to create cross-lingual links with sufficient coverage by only using Wikipedia data. l Other wiki knowledge bases Ø Ø Baidu Baike, 15 million articles million articles Hudong Baike, 16.9 Can we find more cross-lingual links across heterogeneous wiki knowledge bases? 8 Figure 1: Number of articles in different languages in Wikipedia. Figure 2: A
- 9.Cross-lingual Knowledge Linking Titles Links Categories Cross lingual links 9 Authors
- 10.Cross-lingual Knowledge Linking l Linkage Factor Graph Model l Finding more links between English Wikipedia and Baidu Baike Cross-lingual Knowledge Linking Across Wiki Knowledge Bases. WWW 2012. 10
- 11.Boosting Knowledge Linking via Link Annotation l l LFG method relies on the number of links The discovered cross-lingual links are limited in one iteration Boosting Cross-lingual Knowledge Linking via Concept Annotation. IJCAI 2013. 11
- 12.Knowledge Linking Via Heterogeneous Network Embedding l Networks construction Ø Intra-wiki Network ü ü ü Ø l 12 Textual Network Linkage Network Semantic Network Inter-wiki Network HNE Model
- 13.Cross-lingual Knowledge Extraction l Motivation Lots of infobox are missing Ø High skewed status in different languages Ø l Some solutions Ø Ø Translation-based method Mono-lingual extraction method Is it possible to use the rich English knowledge to help extract the Chinese knowledge? 13
- 14.Cross-lingual Knowledge Extraction l Transfer Learning based Method Compared with mono-lingual method Compared with translation-based method Transfer Learning Based Cross-lingual Knowledge Extraction for Wikipedia. ACL 13. 14
- 15.Missing Link Extraction in Infobox English Wikipedia Chinese Wikipedia Entity Linking based on Regression Learning Discovering Missing Semantic Relations between Entities in Wikipedia. ISWC 13. 15
- 16.Cross-lingual Taxonomy Derivation l Background Ø Ø Wikipedia category system is important The user-generated subsumption relations are not exactly isA relations. Raw Facts in Wikipedia. Chicago, Illinois Chicago, Illinois subCategoryOf subClassOf Politicians from Chicago, Illinois Politicians from Chicago, Illinois articleOf instanceOf Barack Obama Barack ObamaNote:l for category/class, Mistaken Result after Reasoning. Chicago, Illinois instanceOf Barack Obama for article/instance. Existing methods Ø Ø 16 Directly Derived Facts in Taxonomy. The heuristic-based methods ü Language dependent features, e.g. singular/plural forms of head words The corpus-based methods ü Rely on a high-quality large-scale corpus ü The corpus are usually domain dependent
- 17.Cross-lingual Taxonomy Derivation l Framework l Boosting Process Online Wiki W1 Taxonomy Derivation Cross-Lingual Links CL Cross-Lingual Knowledge Validation Online Wiki l Taxonomy Derivation T2 Taxonomy Taxonomy Weak Classifier Ø Ø 17 W2 T1 Dynamic Adaptive Boosting Model Linguistic Heuristic Features ü English/Chinese Features based onrdfs:labelü Common Features based onrdfs:commentStructural Features ü Normalized Google Distance ü Concept, Instance, Property Active Set . .. .. . ... . . . ... . . .. . . . . . .. Unknown .... .. . .Data .. .. . . . .. . . .. .. . ... .. Pool Expand + ++ - - + +- + + - -+ -- + Re-weight Hypothesis Add Hypothesis Classifier Output H(x) Training
- 18.Cross-lingual Taxonomy Derivation l Experimental Results Boosting Contribution Analysis. Cross-lingual Knowledge Validation Based Taxonomy Derivation from Heterogeneous Online Wikis. AAAI 2014. 18
- 19.Fine-Grained Entity Typing l Framework l Model Ø Ø Ø 19 Heterogeneous network construction ü Word-word network, Entity-word network ü Entity-type network, Type-word network Joint entity and type representation ü Heterogeneous network embedding ü Modeling type correlation Type-path inference
- 20.Outline n Present Situation and Motivation n Key Technologies for Cross-lingual Knowledge Graph Building 20 n XLORE - Bi-lingual Knowledge Graph n Future Work
- 21.XLORE:English-Chinese Knowledge Graph l XLORE Ø Ø l Knowledge Base Ø Ø Ø 21 Large scale bi-lingual knowledge base Based on four online wiki encyclopedias ü Baidu Baike, Hudong Baike and Chinese Wikipedia ü English Wikipedia Machine readable knowledge encyclopediaSchema-level:concept (class), propertyInstance-level:instance (individual, entity, …)
- 22.XLORE:English-Chinese Knowledge Graph l 22 Framework
- 23.Semantifying Online Wikis [1] Zhigang Wang, Juanzi Li, et. at. Cross-Lingual Knowledge Validation Based Taxonomy Derivation from Heterogeneous Online Wikis. AAAI 2014. 23
- 24.Cross-lingual Knowledge Linking [2] Zhichun Wang, Juanzi Li, Zhigang Wang, and Jie Tang. Cross-lingual Knowledge Linking across Wiki Knowledge Bases. WWW’2012. [3] Zhichun Wang, Juanzi Li, and Jie Tang. Boosting Cross-lingual Knowledge Linking via Concept Annotation. IJCAI2013. 24
- 25.Cross-lingual Structured Knowledge Extraction [4] Zhigang Wang, Zhixing Li, Juanzi Li, Jie Tang, and Jeff Z. Pan. Transfer Learning Based Crosslingual Knowledge Extraction for Wikipedia. ACL2013. 25
- 26.XLORE:English-Chinese Knowledge Graph 26
- 27.XLORE:English-Chinese Knowledge Graph l Statistics Classes Properties Instances English 719,315 85.46% 15,380 26.86% 4,534,067 39.04% Chinese 186,244 22.13% 53,627 93.64% 7,512,413 64.68% Crosslingual 63,895 7.59% 11,739 20.50% 432,598 3.72% Total 841,664 57,268 11,613,882XLore:A Large-scale English-Chinese Bilingual Knowledge Graph. ISWC 2013 Poster and Demo. 27
- 28.XLORE2 l 28 Framework Ø More Flexible Ø More Verifiable Ø Higher Quality
- 29.Cross-lingual Knowledge Linking l 29 Heterogeneous Network Embedding
- 30.Cross-lingual Property Matching l Factor Graph [1] Zhang, Y., Paradis, T., Hou, L., Li,J.:Cross-lingual infobox alignment in wikipedia using entity-attribute factor graph. ISWC 2017. 30
- 31.Fine-Grained Entity Typing l 31 Network Embedding
- 32.XLORE2 System l Statistics l XLink Ø Entity Linking Systemhttp://xlore.org/32
- 33.Outline n Present Situation and Motivation n Key Technologies for Cross-lingual Knowledge Graph Building 33 n XLORE - Bi-lingual Knowledge Graph n Future Work
- 34.Future Work l Multilingual ontology building Ø l l l 34 multilingual concept and property linking Cross-lingual knowledge Embedding and computing using deep learning Event knowledge representation and learning from wiki knowledge bases More Web based multilingual knowledge applications are expected
- 35.Thank you! Q&A