出门问问 李志飞 自然语言处理如何落地互联网
2020-02-27 245浏览
- 1.自然语言处理如何落地互联网 (打造你自己的Google Translate?) 李志飞 出 问问 CEO 邮件:zfli@mobvoi.com 微博:@李志飞-出 问问 1 Saturday, August 17, 13
- 2.Google Translate NLP在互联网上最成功的应用? 2 Saturday, August 17, 13
- 3.Google Translate网站 ‣ 支持71 语言 ‣ 流量全世界排在20以内(类似于bing.com) ‣ 用户数超过3亿,每天的翻译请求10亿级 Saturday, August 17, 13 3
- 4.Google Translate 移动应用 ‣ Google官方最流行 应用之一 ‣ 支持文字,语音, 手写,图片等多媒 体输入 4 Saturday, August 17, 13
- 5.Outline • • • Google Translate 机器翻译for dummy 机器翻译基础理论和算法 ‣ 机器学习 ‣ 数据结构,模型,算法 • 工业界机器翻译系统实战 5 Saturday, August 17, 13
- 6.Training a Translation Model 垫子 上 的 猫 dianzi shang de mao dianzi shang a cat on the mat word alignment? 6 Saturday, August 17, 13
- 7.Word Alignment 垫子 上 的 猫 a cat on the mat 我 看见 猫 I saw a cat 我 I Saturday, August 17, 13 有 猫 和 狗 have a cat and a dog
- 8.Word Alignment 垫子 上 的 数据冗余 猫 a cat on the mat 我 看见 猫 I saw a cat 我 I Saturday, August 17, 13 有 猫 和 狗 have a cat and a dog
- 9.Word Alignment 垫子 上 的 数据冗余 猫 a cat on the mat 我 看见 猫 I saw a cat 我 I Saturday, August 17, 13 有 猫 和 狗 have a cat and a dog
- 10.Word Alignment 垫子 上 的 数据排除 猫 a cat on the mat 我 看见 猫 I saw a cat 我 I Saturday, August 17, 13 有 猫 和 狗 have a cat and a dog
- 11.Word Alignment 垫子 上 的 猫 a cat on the mat 我 看见 猫 I saw a cat 我 I Saturday, August 17, 13 有 猫 和 狗 have a cat and a dog
- 12.Word Alignment 上 的 垫子 猫 a cat on the mat word dictionary context 我 看见 猫 I saw a cat phrase dictionary 我 I 有 猫 和 狗 have a cat and a dog Saturday, August 17, 13
- 13.esame “bank” can be referring to a financial ban dword of ambiguity is called word sense ambiguity. word may have different senses/meanings, depC Phrase Extraction biguity is“bank” called wordbe sense ambiguity. Clearly, uous word, different translations be used d the word can referring toshould a financial bank same word may have different senses/meanings, depw d, different translations should be dependin translation model (for areferring translation task frombank En mbiguity is called word ambiguity. Clearly, w the word “bank” can be sense to aused financial 垫子 上 的 猫 word, different translations should be used depending model a translation task from English towC mbiguity is(for called word sense ambiguity. Clearly, son follows, dianzi shang de mao ation model (for a translation task be from English to C word, different translations should used depending , X (for a ⇥translation bank , he an ⇤ ws, ation model task a cat on thefrom mat English to C X ⇥ bank , he an ⇤ ws, X X ⇥ dianzi bank , shang he an thehang mat ⇤ ⇥ bank , ⇤,yin X ⇥ bank ⇥ bank , hehang mao aan cat⇤ ⇤ , yin e English side (i.e., X ⇥ bank , yinbank), hang ⇤but have different Chin hnXside (i.e., bank), buthang have different Chinese side Chinese. Specifically, the first rule has the Chin ⇥ bank , yin ⇤ lish side (i.e., bank), but have different Chinese sides se. Specifically, the first rule has the Chinese he an er, while the second one has the Chinese yin hang nese. Specifically, thebut firsthave ruledifferent has the Chinese an lish side (i.e., bank), Chinese he sides the second one has the Chinese yin hang, which le theSpecifically, second one the has first the Chinese yin Chinese hang, which nese. rule has the he anm le the second one has 6the Chinese yin hang, which m 13 Saturday, August 17, 13
- 14.esame “bank” can be referring to a financial ban dword of language, ambiguity is called word sense ambiguity. ural the same may have differen word may have different senses/meanings, depC Phrase Extraction e Ambiguity biguity is“bank” called word sense ambiguity. Clearly, w uous word, different translations should be used d ext.word For example, the word “bank” can be referr the can be referring to a financial bank same word may have different senses/meanings, dep nese-to-English translation are X ⇥ mao , a cat ⇤ d, different translations should be used dependin translation model (for a translation task from En mbiguity is called word sense ambiguity. Clearly, w the word “bank” can be referring to a financial bank ver. This kind of ambiguity is called word sense same word may have different senses/meanings, dep 垫子 上 的 猫 word, different translations should be used depending model acat translation task from English towC mbiguity is(for called word sense ambiguity. Clearly, son follows, ch an ambiguous word, different translations sho ⇥ mao , a ⇤ the word “bank” can be referring to a financial bank X ⇥ X0dianzi de X1shang , X1 de of X0 ⇤ ation model a translation task from to w Ct word, different translations should be used depending r, example, a(for translation model (for a English translation mbiguity is called word sense ambiguity. Clearly, X ⇥ bank , he an ⇤ ws, ation model (for a translation task from English to C nement between nonterminals is implicitly enco word, different translations should be used depending on the mat nro two rules as follows, ⇥e X de X , X of X ⇤ 0 1 1 0 Hiero rules for rules Chinese-to-English for Chinese-to-English translation translation are are X ⇥ bank , he an ⇤ ws, ation (for a translation task from English toX, C ls. XInmodel Hiero, there is one single nonterminal ⇥ bank , he an ⇤ dianzi shang the mat X ⇥ bank , yin hang ⇤ , ween nonterminals is implicitly encoded by the X ⇥ bank , he an ⇤ ws, X X ⇥ mao ⇥ , mao a cat , a ⇤ cat ⇤ al.,X2006) may contain more nonterminals like n ⇥ bank , he an ⇤ mao a cat ⇥ bank , yin hang ⇤ , there is one single nonterminal X, while a syn e English side (i.e., bank), but have different Chin X first rule ⇥ bank , yin hang ⇤can translate the Chin he shows that we X X ⇥ X ⇥ X X , X X of , X X ⇤ X the mat X ⇥ bank , he an dianzi shang X ⇥ bank , yin hang ⇤ 0 de 0 1de 1⇤ 1de 10of 0 ⇤ on , may contain more nonterminals like noun phrase h side (i.e., bank), but have different Chinese side n Chinese. Specifically, the first rule has the Chin X ⇥ bank , yin hang ⇤ elish second rulebank), showsbut that thedifferent two phrases (repres side (i.e., have Chinese sides e es, alinement the alinement between between nonterminals nonterminals is implicitly is implicitly encoded encoded by the b shows that we can translate the Chinese word m se. Specifically, the first rule has the Chinese he an er, while the second one has yin hang have the same English side (i.e., bank), but have d X ⇥ bank , yin hang ⇤ will get reordered around of in the English. Su nese. Specifically, the first rule has the Chinese he an lish side (i.e., bank), but have different Chinese sides minals. onterminals. In Hiero, In Hiero, there there is oneissingle one single nonterminal nonterminal X, while X, whil a sy e shows that the two phrases (represented by X the second one has the Chinese yin hang, which ent meaning in Chinese. Specifically, the first rule 0 le the second one has the Chinese yin hang, which m nese. Specifically, the first rule has the Chinese he an ey (Galley et al., et 2006) al., 2006) may contain may but contain more more nonterminals nonterminals like noun like noun phrase p nese and English. lish side (i.e., bank), have different Chinese sides 2 2of a ordered around of in the English. Such reorderi edge river, while the second one has the Chin .le (VP). The first The rule first shows rule shows that we that can we translate can translate the Chinese the word w m 6 the second one has the Chinese yin hang,Chinese which m 14 Saturday, August 17, 13
- 15.esame “bank” can be referring to a financial ban dword of language, ambiguity is called word sense ambiguity. ural the same may have differen word may have different senses/meanings, depC Phrase Extraction e Ambiguity biguity isChinese-to-English called word sense ambiguity. Clearly, w uous word, different translations should be used d ext.word For example, the word “bank” can be referr the “bank” can be referring to a financial bank same word may have different senses/meanings, dep age, the same word may have different senses/me ules for translation are ⇥ mao , a cat ⇤ e Ambiguity d, different translations should be used dependin translation model (for a translation task from En mbiguity is called word sense ambiguity. Clearly, w the word “bank” can be referring to a financial bank ver. This kind of ambiguity is called word sense xample, the word “bank” can be referring to a fin same word may have different senses/meanings, dep 垫子 上 的 猫 word, different translations should be used depending model (for a translation task from English to C mbiguity is called word sense ambiguity. Clearly, w son follows, ch an ambiguous word, different translations sho X ⇥ mao , a cat ⇤ the word “bank” can be referring to a financial bank ind of ambiguity is called word sense ambiguity. same different senses/meanings, dep de mao ⇥ X0 word de X1may , Xhave of X ⇤ 1 0 ation model (for a translation task from English to C word, different translations should be used depending ,guous rthe example, a translation model (for a translation mbiguity is called word sense ambiguity. Clearly, w word, different be bank usedt word “bank” can be translations referring to ashould financial X ⇥ bank , he an ⇤ ws, ation model (for a translation task from English to C word, different translations should be used depending a cat on ween nonterminals is implicitly encoded by the nro two rules as follows, X ⇥ X de X , X of X ⇤ a translation model (for a translation task from E mbiguity is called word sense ambiguity. Clearly, w 0 1 1 0 eXHiero rules for rules Chinese-to-English for Chinese-to-English translation translation are are ⇥ bank , he an ⇤ ws, ation model (for a translation task be from English to C o, there is one single nonterminal X, while a sy word, different translations should used depending as follows, X ⇥ bank , he an ⇤ dianzi shang the mat X ⇥ bank , yin hang ⇤ , ple to-English Hiero rules translation for Chinese-to-English are translation are inement between nonterminals is implicitly enc X ⇥ bank , he an ⇤ ws, X X ⇥ mao ⇥ , mao a cat , a ⇤ cat ⇤ ation model (for a translation task from English to C may contain more nonterminals like noun phrase X ⇥ bank , he an ⇤ mao a cat ⇥ bank , yin hang ⇤ ls. In Hiero, there is one single nonterminal X e English side (i.e., bank), but have different Chin X ⇥ bank , he an ⇤ ws, Xmao , athat ⇥catbank yintranslate hang ⇥shows ⇤weX,can ⇥ mao ⇤, athe cat ⇤Chinese word m X X ⇥ X ⇥ X X , X X of , X X ⇤ X the mat X ⇥ bank , he an dianzi shang X ⇥ bank , yin hang ⇤ 0 de 0 1de 1⇤ 1de 10of 0 ⇤ on , al., 2006) may contain more nonterminals like n h side (i.e., bank), but have different Chinese side n Chinese. Specifically, the first rule has the Chin X ⇥ bank , yin hang ⇤ le shows that the two phrases (represented by X lish side (i.e., bank), but have different Chinese sides de X , X of X X ⇤ ⇥ X de X , X of X ⇤ a cat on X ⇥ bank , he an ⇤ de mao X ⇥ bank , yin hang ⇤ e es, alinement the alinement between between nonterminals nonterminals is implicitly is implicitly encoded encoded by the b 1 1 0 0 1 1 0 , he first rule shows that we can translate the Chin se. Specifically, the first rule has the Chinese he an er, while the second one has the Chinese yin hang have the same English side (i.e., bank), but have d X ⇥ bank , yin hang ⇤ ordered around of in the English. Such reorder nese. Specifically, the first rule has the Chinese he lish side (i.e., bank), different Chinese sides minals. onterminals. In Hiero, In Hiero, there there isbut one ishave single one single nonterminal nonterminal X, while X, whil aan sy neules, nonterminals the alinement is implicitly between encoded nonterminals by the is subimplicitly encoded second rule shows that the two phrases (repres the second one has the Chinese yin hang, which ent meaning in Chinese. Specifically, the first rule me English side (i.e., bank), but have different Ch X ⇥ bank , yin hang ⇤ le the second one has the Chinese yin hang, which m nese. Specifically, the first rule has the Chinese he an ey (Galley et al., et 2006) al., 2006) may contain may contain more more nonterminals nonterminals like noun like noun phrase p glish. lish side (i.e., bank), but have different Chinese sides ere nonterminals. is one singleInnonterminal Hiero, thereX, is one while single a syntaxnonterminal X, wh 2 2 eedge will get reordered around of in the English. S of a river, while the second one has the Chin (VP). The first The rule first shows rule shows that we that can we translate can translate the Chinese the Chinese word m 6 g.le in Chinese. Specifically, the first rule has the Ch the second one has the Chinese yin hang, which w m 15 Saturday, August 17, 13
- 16.on. speaking, there are two kinds of ambigu esame “bank” can be referring to a financial ban dword ofBroadly ambiguity is called word sense ambiguity. ural language, the same may have differen word may have different senses/meanings, depC Phrase Extraction e Ambiguity n-Sense Ambiguity ambiguity and spurious ambiguity. biguity isChinese-to-English called word sense ambiguity. Clearly, uous word, different translations be d ext.word For example, the word “bank” cansenses/me be used referr the “bank” can be referring toshould a financial bank same word may have different senses/meanings, depw age, the same word may havetranslation different ules for are ules ⇥formao Chinese-to-English translation are , a cat ⇤ X ⇥ mao , a cat ⇤ e Ambiguity d, different translations should be used dependin translation model (for a translation task from En mbiguity is called word sense ambiguity. Clearly, w the word “bank” can be referring to a financial bank ver. This kind of ambiguity is called word sense xample, the word “bank” can be referring to a fin same word may have different senses/meanings, dep ge, the same word may have different senses/mea 垫子 上 的 猫 word, different translations should be used depending on model (for a translation task from English to C mbiguity is called word sense ambiguity. Clearly, w sample, follows, ch an ambiguous word, different translations sho X ⇥ mao , a cat ⇤ e Ambiguity the word “bank” can be referring to a financial bank ind of ambiguity is called word sense ambiguity. mao , a cat ⇤ same may different senses/meanings, dep “bank” can be referring to a fina de ⇥ X0 word dethe X1word , Xhave of X ⇤ X ⇥ X de X , X of X ⇤ 1 0 0 1 1 0 ation model (for a translation task from English to C word, different translations should be used depending ,guous rthe example, a translation model (for a translation t mbiguity is called word sense ambiguity. Clearly, w word, different translations should be used word “bank” can be referring to a financial bank nd of ambiguity is called word sense ambiguity. same word may have different senses/meanings, dep X ⇥ bank , he an ⇤ ws, ation model (for a translation task from English to C word, different translations should be used depending on rules, the alinement between nonterminals is imp ween nonterminals is implicitly encoded by the n two rules as follows, X ⇥ X de X , X of X ⇤ X de X , X of X ⇤ a translation model (for a translation task from E mbiguity is called word sense ambiguity. Clearly, w 0 1 1 0 0 1 1 0 guous word, different translations should be used ro eXHiero rules for rules Chinese-to-English for Chinese-to-English translation translation are are the word “bank” can be referring to a financial bank ⇥ bank , he an ⇤ ws, ation model (for aIntranslation taskisbe from English to C easnonterminals. Hiero, there one single nonte o, there is one single nonterminal X, while a sy word, different translations should used depending follows, X ⇥ bank , he an ⇤ dianzi shang alinement translation model (for a translation task from En the mat mbiguity is called word sense ambiguity. Clearly, w X ⇥ bank , yin hang ⇤ , ple to-English Hiero rules translation for Chinese-to-English are translation are between nonterminals is implicitly encod inement between nonterminals is implicitly enc X ⇥ bank , he an ⇤ ws, X X ⇥ mao ⇥ , mao a cat , a ⇤ cat ⇤ mar (Galley et al., 2006) may contain more nonterm iero les for rules Chinese-to-English for Chinese-to-English translation translation are are ation model (for a translation task from English to C may contain more nonterminals like noun phrase X ⇥ bank , he an ⇤ mao word, different translations should be used depending a cat ⇥ bank , yin hang ⇤ as follows, als. In Hiero, there is one single nonterminal X, w 2 ls. In Hiero, there is one single nonterminal X le to-English Hiero rules translation for Chinese-to-English are translation are eshows English side (i.e., bank), but have different Chin X⇤(for ⇥ bank , he an ⇤ ws, X ⇥catbank , yin hang ⇤ ase (VP). The first rule shows that we can translat ⇥ation mao ,model athat X ⇥ mao , a cat ⇤ we can translate the Chinese word m a translation task from English to C X X ⇥ X de ⇥ X X de , X X of , X X of ⇤ X ⇤ on the mat X ⇥ bank , he an ⇤ dianzi shang de X ⇥ bank , yin hang ⇤ 0 contain 01 11 10nonterminals 0 , t al., 2006) may more like no 3 al., 2006) may contain more nonterminals like n h side (i.e., bank), but have different Chinese side n Chinese. Specifically, the first rule has the Chin X X ⇥ mao ⇥ , mao a cat , ⇤ a cat ⇤ X ⇥ bank , yin hang ⇤ ds a cat. The second rule shows that the two phra le shows that the two phrases (represented by X X ⇥ bank , he an ⇤ ws, ⇥lish mao , a cat ⇤ X ⇥ mao , a cat ⇤ side (i.e., bank), but have different Chinese sides de X , X of X X ⇤ ⇥ X de X , X of X ⇤ a cat on X ⇥ bank , he an ⇤ de mao he first rule shows that we can translate the Chines X ⇥ bank , yin hang ⇤ e es, alinement the alinement between between nonterminals nonterminals is implicitly is implicitly encoded encoded by the b 1 1 0 0 1 1 0 , he first rule shows that we can translate the Chin se. Specifically, the first rule has the Chinese he an er, while the second one has the Chinese yin hang have the same English side (i.e., bank), but have d de in Chinese will get reordered around of in the E X ⇥ bank , yin hang ⇤ ordered around of in the English. Such reorder nese. Specifically, the first rule has the Chinese he an lish side (i.e., bank), but have different Chinese sides minals. onterminals. In Hiero, In Hiero, there there is one is single one single nonterminal nonterminal X, while X, whil a sy he second rule shows that the two phrases (represen nbetween ules, nonterminals the alinement is implicitly between encoded nonterminals by the is subimplicitly encoded de , X of X X ⇤ ⇥ X de X , X of X ⇤ on e second rule shows that the two phrases (repres X ⇥ bank , he an ⇤ the second one has the Chinese yin hang, which de X X ⇥ X de ⇥ X X de , X X of , X ⇤ of X ⇤ X ⇥ bank , yin hang ⇤ 1 1 0 0 1 1 0 , ent meaning in Chinese. Specifically, the first rule Chinese and English. 0 0 1 1 1 1 0 0 me English side (i.e., bank), but have different Ch X ⇥ bank , yin hang ⇤ le the second one has the Chinese yin hang, which m nese. Specifically, the first rule has the Chinese he an ey (Galley et al., et 2006) al., 2006) may contain may contain more more nonterminals nonterminals like noun like noun phrase p glish. lish side (i.e., bank), but have different Chinese sides ere nonterminals. is one single Innonterminal Hiero, there X, is one while single a syntaxnonterminal X,Suc wh e will get reordered around of in the English. 2 2 eedge will get reordered around of in the English. S of a river, while the second one has the Chin (VP). The first The rule first shows rule shows that we that can we translate can translate the Chinese the Chinese word m oned, a rule in Hiero may contain both terminal ( 6 nonterminals the alinement is implicitly between nonterminals encoded by the is implicitly subencoded gn.les, in Chinese. Specifically, the first rule has the Ch le the second one has the Chinese yin hang, which w m 16 Saturday, August 17, 13
- 17.is incorporated into the SMT system through using a language model, whi the monolingual English data. Decoding a Test Sentence 1.1.3 Discriminatively 垫子 上 的Training 狗 of Relative Weights Amon With the translation and language models, how much should we trust dianzi Intuitively, weshang can assign a de weightgou to each model, and trust the model prop weight. These weights are usually found through a discriminative training a the Och, 2003).dog on the mat X X X S ⇥ dianzi shang , the mat ⇤ 1.1.4 Decoding for Test Data ⇥ gou , the dog ⇤ the bilingual and monolingual training data, we have trained a SM ⇥ X0 de X1 , X1 on XWith 0⇤ following the pipeline in Figure 1.1), which has translation and language m ⇥ X0 , X0 ⇤ relative weights among the models. Now, we can generate translation out test data by using the trained SMT system. For example, we may generate a dog onSthe ⇥X mat” for0“⇤ ”, assuming that the translation gramma 0, X (a) Hiero rules in the test grammar a rule “X ⇥ , the dog ⇤” that is extracted from other training examples Translation is easy? X ⇥X0 de X1 , X1 on X0 ⇤ X X X X X ⇥dianzi shang, the mat⇤ dianzi shang Saturday, August 17, 13 de ⇥ dianzi shang , the mat ⇤ ⇥ gou , the dog ⇤ ⇥gou, the dog ⇤ ⇥ X1 de X2 , X2 on X1 ⇤ gou (a) Hiero rules in the grammar 17
- 18.Translation Ambiguity 垫子 上 的 猫 dianzi shang de mao a cat on the mat X ⇥X0 de X1 , X1 on X0 ⇤ zhongguo de shoudu capital of China wo de mao my cat zhifei de mao zhifei ’s cat Saturday, August 17, 13 X ⇥X0 de X1 , X1 of X0 ⇤ X ⇥X0 de X1 , X0 X1 ⇤ X ⇥X0 de X1 , X0 ’s X1 ⇤ 18
- 19.SSSS⇥X ,0X ⇤0⇤⇤0 ⇤ ⇥X ,,0X ⇥X ⇥X X ,X 0 0 0 0 X de X X ⇥X de X X ’s 0 1⇤0 X X X ⇥X ⇥X000⇥X de X111,,X ,X X1101,of on0XX 0 de 1⇤ XX shang, the mat⇤ ⇥dianzi shang, the mat⇤ XX⇥dianzi ⇥dianzi ⇥dianzi shang, shang, the the mat⇤ mat⇤ XX aaacat⇤ ⇥mao, XX⇥mao, ⇥mao, ⇥mao, acat⇤ cat⇤ cat⇤ dianzi dianzi shang de mao dianzi shang de de mao mao 000 0shang 111 1 de 222 2 mao 333 3 Decoder (e.g. Joshua) dianzi shang de mao Saturday, August 17, 13 19
- 20.S ⇥X0 , X0 ⇤ S ⇥X0 , X0 ⇤ X ⇥X0 de X1 , X0 X1 ⇤ X ⇥X0 de X1 , X1 on X0 ⇤ X ⇥dianzi shang, the mat⇤ dianzi0 shang1 X ⇥mao, a cat⇤ de2 mao3 X ⇥dianzi shang, the mat⇤ dianzi0 shang1 S ⇥X0 , X0 ⇤ mao3 S ⇥X0 , X0 ⇤ X ⇥X0 de X1 , X1 of X0 ⇤ dianzi0 shang1 de2 the mat a cat a cat on the mat X ⇥dianzi shang, the mat⇤ X ⇥mao, a cat⇤ X ⇥mao, a cat⇤ de2 mao3 X ⇥X0 de X1 , X0 ’s X1 ⇤ X ⇥dianzi shang, the mat⇤ dianzi0 shang1 a cat of the mat X ⇥mao, a cat⇤ de2 mao3 the mat ’s a cat Decoder (e.g. Joshua) dianzi shang de mao Saturday, August 17, 13 20
- 21.Language Model a cat on the mat the mat a cat a cat of the mat the mat ’s a cat 在没看到中文原文情况下,能看出 个英文句子更靠谱吗? 21 Saturday, August 17, 13
- 22.Statistical Machine Translation Pipeline Held-out Bilingual Data Bilingual Data Align, Extract Monolingual English Training Unseen Sentences Translation Models Language Models Decoding Discriminative Training Optimal Weights Translation Outputs 22 Saturday, August 17, 13
- 23.Numbers in Real World • 训练句子对 ‣ 几千万(一个语言对) • Phrase Dictionary ‣ 亿级条目(一个语言对) • 语言模型 ‣ 亿级ngrams(一个语言对) 23 Saturday, August 17, 13
- 24.Outline • • • Google Translate 机器翻译for dummy 机器翻译基础理论和算法 ‣ 机器学习 ‣ 数据结构,模型,算法 • 工业界机器翻译系统实战 24 Saturday, August 17, 13
- 25.机器学习:分类器 【机器学习实战】机器学习是人工智能研 究领域中一个 其重要的研究方向,在现 今的大数据时代背景下,捕获数据并从中 萃取有价值的信息或模式,成为...http://t.cn/zHNXceF。想看更多“机器学习”的资 讯,猛戳→http://t.cn/zjNCS5w • ? 微博 体育新闻 政治新闻 军事新闻 分类 (Classification) ‣ 输入:特征 ‣ 输出:类 ‣ Naive Bayes,最大熵,SVM,神经网络等 25 Saturday, August 17, 13
- 26.Structured Prediction(SP): 结构化预测 • 词性标注是一个典型的SP问题 I 名词 like machine-learning 动词 名词I:名词like:介词,动词machine-learning:名词,动词 26 Saturday, August 17, 13
- 27.Structured Prediction as Classification • SP可以看成是特殊的分类问题 ‣ 类 的个数随着输入的长度而指数级增长 I 类 1 2 3 4 名词 名词 名词 名词 like machine-learning 动词 动词 介词 介词 名词 动词 名词 动词 27 Saturday, August 17, 13
- 28.Structured Prediction as Classification • SP可以看成是特殊的分类问题 ‣ 类 的个数随着输入的长度而指数级增长 ‣ 类 内部有联系 I 名词 like machine-learning 动词 名词 28 Saturday, August 17, 13
- 29.Structured Prediction as Classification • SP可以看成是特殊的分类问题 ‣ 类 的个数随着输入的长度而指数级增长 ‣ 类 内部有联系 ‣ 类 之间有联系 I 名词 like machine-learning 动词 动词 介词 名词 29 Saturday, August 17, 13
- 30.Structured Prediction as Classification • SP可以看成是特殊的分类问题 ‣ 类 的个数随着输入的长度而指数级增长 ‣ 类 内部有联系 ‣ 类 之间有联系 这些特殊性使得SP的难度增大, 尤其是在算法上! 许多在分类上特 简单的算法(如解 码)在SP上变得很 杂 30 Saturday, August 17, 13
- 31.Structured Prediction 问题 任务 输入 类 中文分词 句子 词序列 词性标注 句子 词性序列 语法解析 句子 语法树 机器翻译 英文句子 中文句子 语音识 声音 句子 手写识 笔话 句子 光学识 图片 句子 31 Saturday, August 17, 13
- 32.Outline • • • Google Translate 机器翻译for dummy 机器翻译基础理论和算法 ‣ 机器学习 ‣ 数据结构,模型,算法 • 工业界机器翻译系统实战 32 Saturday, August 17, 13
- 33.S ⇥X0 , X0 ⇤ S ⇥X0 , X0 ⇤ X ⇥X0 de X1 , X0 X1 ⇤ X ⇥X0 de X1 , X1 on X0 ⇤ X ⇥dianzi shang, the mat⇤ dianzi0 shang1 X ⇥mao, a cat⇤ de2 mao3 X ⇥dianzi shang, the mat⇤ dianzi0 shang1 S ⇥X0 , X0 ⇤ mao3 S ⇥X0 , X0 ⇤ X ⇥X0 de X1 , X1 of X0 ⇤ dianzi0 shang1 de2 the mat a cat a cat on the mat X ⇥dianzi shang, the mat⇤ X ⇥mao, a cat⇤ X ⇥mao, a cat⇤ de2 mao3 X ⇥X0 de X1 , X0 ’s X1 ⇤ X ⇥dianzi shang, the mat⇤ dianzi0 shang1 a cat of the mat X ⇥mao, a cat⇤ de2 mao3 the mat ’s a cat Decoder (e.g. Joshua) dianzi shang de mao Saturday, August 17, 13 33
- 34.S 0,4 S ⇥X0 , X0 ⇤ h p a r g r e p y h X 0,4 X ⇥X0 de X1 , X1 on X0 ⇤ ⇥X0⇥X ,X ,0X ⇤ 0⇤⇤ 0X X ⇥X0 de XS1S, SX ⇥X 0 ’s 0, X 01⇤ X ⇥X0 de X1 , X0 X1 ⇤ X ⇥X0 de X1 , X1 of X0 ⇤ X X⇥X⇥X X ,X ’s X1X ⇤ 0 de 00 de X ⇥X de 1X X11,,0X X11 on of X00⇤⇤ X 0,2 X X⇥dianzi ⇥dianzi shang, shang, thethe mat⇤ mat⇤ X ⇥dianzi shang, the mat⇤X X⇥mao, ⇥mao, a cat⇤ a cat⇤ X ⇥mao, a cat⇤ X 3,4 dianzi dianzi shang de mao 0 0 0shang 1 1 1 de 22 2 mao 33 3 dianzi shang de mao X ⇥dianzi shang, the mat⇤ dianzi0 shang1 X ⇥mao, a cat⇤ de2 mao3 (a) A hypergraph encodes four different derivation trees as shown in the four figures below. Rectangles represent items (or nodes), where each item is identified by the non-terminal symbol and source span. An item has one or more incoming hyperedges, Joshua) which represent different (e.g. ways of deriving the item. A hyperedge consists of a rule, and a pointer to an antecedent item for each non-terminal symbol in the rule. Decoder dianzi shang de mao Saturday, August 17, 13 34
- 35.A hypergraph is a compact data structure to encode exponentially many trees. S 0,4 hyperedge S ⇥X0 , X0 ⇤ X 0,4 e g d e r e p y h X ⇥X0 de X1 , X1 on X0 ⇤ X ⇥X0 de X1 , X0 ’s X1 ⇤ X ⇥X0 de X1 , X0 X1 ⇤ node X ⇥X0 de X1 , X1 of X0 ⇤ FSA X 0,2 X ⇥dianzi shang, the mat⇤ dianzi0 shang1 Saturday, August 17, 13 edge X 3,4 X ⇥mao, a cat⇤ de2 Packed Forest mao3 (a) A hypergraph encodes four different derivation trees as shown in the four figures below. Rectangles represent items (or nodes), 35
- 36.A hypergraph is a compact data structure to encode exponentially many trees. S 0,4 S ⇥X0 , X0 ⇤ X 0,4 X ⇥X0 de X1 , X1 on X0 ⇤ X ⇥X0 de X1 , X0 ’s X1 ⇤ X ⇥X0 de X1 , X0 X1 ⇤ X ⇥X0 de X1 , X1 of X0 ⇤ X 0,2 X ⇥dianzi shang, the mat⇤ dianzi0 shang1 Saturday, August 17, 13 X 3,4 X ⇥mao, a cat⇤ de2 mao3 (a) A hypergraph encodes four different derivation trees as shown in the four figures below. Rectangles represent items (or nodes), 36
- 37.A hypergraph is a compact data structure to encode exponentially many trees. S 0,4 S ⇥X0 , X0 ⇤ X 0,4 X ⇥X0 de X1 , X1 on X0 ⇤ X ⇥X0 de X1 , X0 ’s X1 ⇤ X ⇥X0 de X1 , X0 X1 ⇤ X ⇥X0 de X1 , X1 of X0 ⇤ X 0,2 X ⇥dianzi shang, the mat⇤ dianzi0 shang1 Saturday, August 17, 13 X 3,4 X ⇥mao, a cat⇤ de2 mao3 (a) A hypergraph encodes four different derivation trees as shown in the four figures below. Rectangles represent items (or nodes), 37
- 38.A hypergraph is a compact data structure to encode exponentially many trees. S 0,4 S ⇥X0 , X0 ⇤ X 0,4 X ⇥X0 de X1 , X1 on X0 ⇤ X ⇥X0 de X1 , X0 ’s X1 ⇤ X ⇥X0 de X1 , X0 X1 ⇤ X ⇥X0 de X1 , X1 of X0 ⇤ X 0,2 X ⇥dianzi shang, the mat⇤ dianzi0 shang1 Saturday, August 17, 13 X 3,4 X ⇥mao, a cat⇤ de2 mao3 (a) A hypergraph encodes four different derivation trees as shown in the four figures below. Rectangles represent items (or nodes), 38
- 39.A hypergraph is a compact data structure to encode exponentially many trees. S 0,4 Structure sharing S ⇥X0 , X0 ⇤ X 0,4 X ⇥X0 de X1 , X1 on X0 ⇤ X ⇥X0 de X1 , X0 ’s X1 ⇤ X ⇥X0 de X1 , X0 X1 ⇤ X ⇥X0 de X1 , X1 of X0 ⇤ X 0,2 X ⇥dianzi shang, the mat⇤ dianzi0 shang1 Saturday, August 17, 13 X 3,4 X ⇥mao, a cat⇤ de2 mao3 (a) A hypergraph encodes four different derivation trees as shown in the four figures below. Rectangles represent items (or nodes), 39
- 40.Why Hypergraphs? • General compact data structure • • special cases include • • • finite state machine (e.g., lattice) and/or graph packed forest can be used for speech, parsing, tree-based MT systems, and many more 40 Saturday, August 17, 13
- 41.S 0,4 Linearmodel:S ⇥X0 , X0 ⇤ X 0,4 p(d x) = · (d, x) X ⇥X0 de X1 , X1 on X0 ⇤ Weighted Hypergraph X ⇥X0 de X1 , X0 ’s X1 ⇤ X ⇥X0 de X1 , X0 X1 ⇤ X ⇥X0 de X1 , X1 of X0 ⇤ weights X 0,2 features X 3,4 X ⇥dianzi shang, the mat⇤ X ⇥mao, a cat⇤ dianzi0 shang1 de2 mao3 X ⇥X0 de X1 , X0 X1 ⇤ p=3 p=1 X ⇥dianzi shang, the mat⇤ S ⇥X0 , X0 ⇤ dianzi0 shang1 shang 0 thenslation:dianzia cat on mat 1 X ⇥mao, a cat⇤ de2 mao 3 (c)Translation:the mat a cat X ⇥dianzi shang, the mat⇤ dianzi0 shang1 X ⇥mao, a cat⇤ de2 X ⇥X0 de X1 , X1 on X0 ⇤ X ⇥dianzi shang, the mat⇤ p=2 S ⇥X0 , X0 ⇤ (a) A hypergraph encodes four different derivation trees as shown in the four figures below. Rectangles represent items (or nodes), where each item is identified by the non-terminal symbol and S ⇥X0 , X0 ⇤ source span. An item has one or more incoming hyperedges, which represent different ways of deriving the item. A hyperX item ⇥X0 for de X1 , X0 ’s X1 ⇤ edge consists of a rule, and a pointer to an antecedent each non-terminal symbol in the rule. Saturday, August 17, 13 derivation foreign input mao3 X ⇥mao, a cat⇤ de2 S ⇥X0 , X0 ⇤ mao3 p=2 X ⇥X0 de X1 , X1 of X0 ⇤ X ⇥dianzi shang, the mat⇤ dianzi0 shang1 X ⇥mao, a cat⇤ de2 mao3 41
- 42.S 0,4 Log-linearmodel:S ⇥X0 , X0 ⇤ X 0,4 p(d x) = Probabilistic Hypergraph X ⇥X0 de X1 , X1 on X0 ⇤ X ⇥X0 de X1 , X0 ’s X1 ⇤ X ⇥X0 de X1 , X0 X1 ⇤ X ⇥X0 de X1 , X1 of X0 ⇤ X 0,2 · (d,x) Z(x) Z=2+1+3+2=8 X 3,4 X ⇥dianzi shang, the mat⇤ X ⇥mao, a cat⇤ dianzi0 shang1 de2 mao3 p=3/8 p=1/8 X ⇥dianzi shang, the mat⇤ S ⇥X0 , X0 ⇤ dianzi0 shang1 shang 0 thenslation:dianzia cat on mat 1 X ⇥mao, a cat⇤ de2 mao 3 (c)Translation:the mat a cat X ⇥X0 de X1 , X0 X1 ⇤ X ⇥dianzi shang, the mat⇤ dianzi0 shang1 X ⇥mao, a cat⇤ de2 X ⇥X0 de X1 , X1 on X0 ⇤ X ⇥dianzi shang, the mat⇤ p=2/8 S ⇥X0 , X0 ⇤ (a) A hypergraph encodes four different derivation trees as shown in the four figures below. Rectangles represent items (or nodes), where each item is identified by the non-terminal symbol and S ⇥X0 , X0 ⇤ source span. An item has one or more incoming hyperedges, which represent different ways of deriving the item. A hyperX item ⇥X0 for de X1 , X0 ’s X1 ⇤ edge consists of a rule, and a pointer to an antecedent each non-terminal symbol in the rule. Saturday, August 17, 13 e mao3 X ⇥mao, a cat⇤ de2 mao3 p=2/8 S ⇥X0 , X0 ⇤ X ⇥X0 de X1 , X1 of X0 ⇤ X ⇥dianzi shang, the mat⇤ dianzi0 shang1 X ⇥mao, a cat⇤ de2 mao3 42
- 43.S 0,4 The hypergraph defines a probability distribution over trees! the distribution is parameterized by Θ S ⇥X0 , X0 ⇤ X 0,4 Probabilistic Hypergraph X ⇥X0 de X1 , X1 on X0 ⇤ X ⇥X0 de X1 , X0 ’s X1 ⇤ X ⇥X0 de X1 , X0 X1 ⇤ X ⇥X0 de X1 , X1 of X0 ⇤ X 0,2 X 3,4 X ⇥dianzi shang, the mat⇤ X ⇥mao, a cat⇤ dianzi0 shang1 de2 mao3 (a) A hypergraph encodes four different derivation trees as shown in the four figures below. Rectangles represent items (or nodes), where each item is identified by the non-terminal symbol and S ⇥X0 , X0 ⇤ source span. An item has one or more incoming hyperedges, which represent different ways of deriving the item. A hyperX item ⇥X0 for de X1 , X0 ’s X1 ⇤ edge consists of a rule, and a pointer to an antecedent each non-terminal symbol in the rule. p=3/8 p=1/8 X ⇥dianzi shang, the mat⇤ S ⇥X0 , X0 ⇤ dianzi0 shang1 shang 0 thenslation:dianzia cat on mat 1 Saturday, August 17, 13 X ⇥mao, a cat⇤ de2 mao 3 (c)Translation:the mat a cat X ⇥X0 de X1 , X0 X1 ⇤ X ⇥dianzi shang, the mat⇤ dianzi0 shang1 X ⇥mao, a cat⇤ de2 X ⇥X0 de X1 , X1 on X0 ⇤ X ⇥dianzi shang, the mat⇤ p=2/8 S ⇥X0 , X0 ⇤ mao3 X ⇥mao, a cat⇤ de2 mao3 p=2/8 S ⇥X0 , X0 ⇤ X ⇥X0 de X1 , X1 of X0 ⇤ X ⇥dianzi shang, the mat⇤ dianzi0 shang1 X ⇥mao, a cat⇤ de2 mao3 43
- 44.S 0,4 The hypergraph defines a probability distribution over trees! the distribution is parameterized by Θ S ⇥X0 , X0 ⇤ X 0,4 Probabilistic Hypergraph X ⇥X0 de X1 , X1 on X0 ⇤ X ⇥X0 de X1 , X0 ’s X1 ⇤ X ⇥X0 de X1 , X0 X1 ⇤ X ⇥X0 de X1 , X1 of X0 ⇤ X 0,2 X ⇥dianzi shang, the mat⇤ dianzi0 shang1 training (e.g., mert) decoding (e.g., mbr) X 3,4 X ⇥mao, a cat⇤ de2 mao3 (a) A hypergraph encodes four different derivation trees as shown in the four figures below. Rectangles represent items (or nodes), where each item is identified by the non-terminal symbol and source span. An item has one or more incoming hyperedges, which represent different ways of deriving the item. A hyperedge consists of a rule, and a pointer to an antecedent item for each non-terminal symbol in the rule. atomic inference operations (e.g., finding one-best, k-best or expectation, inference can be exact or approximate) Which translation do we present to a user? Decoding How do we set the parameters Θ? Training What atomic operations do we need to perform? Atomic Inference Why are the problems difficult? - brute-force will be too slow as there are exponentially many trees, so require sophisticated dynamic programs - onsometimes intractable, require approximationsnslation:a cat the mat (c) Translation:the mat a cat Saturday, August 17, 13 44
- 45.Inference, Training and Decoding on Hypergraphs • Atomic Inference Algorithms • finding one-best derivations Best-first Graph Topological no heuristic with heuristic with hierarchy FSA Viterbi Dijkstra A HA Hypergraph CYK Knuth Klein and Manning Generalized A • • finding k-best derivations Table 2.2: Algorithms for extracting one-best from an FSA or hypergraph computing expectations (e.g., of features) The above problem is the same as the lightest problem defined by Knuth Decoding Training • derivation •(1977) for a hypergraph. It is also equivalent to the shortest-path problem for an FSA (Di• • • • • • • • Viterbi jkstra,Perceptron 1959). Table 2.2 shows the classical algorithms that solve this problem. In general, Maximum a posterior Conditional random (CRF) the search follows the algorithms can be classifiedfield by whether a certain topological(MAP) order Minimum Bayes riskthe(MBR) or best-first. The well-known Viterbi (Viterbi, 1967) algorithm for an FSA and CockeMinimum error rate training (MERT) Younger-Kasami (CYK) algorithm (alternatively called CKY) for a hypergraph search the Minimum risk graph in a topological order.1 In the best-first search category, we can further classify the MIRAby whether heuristic functions are used for estimating the cost from the current algorithms node to the goal node. The algorithms described by Dijkstra (1959) and Knuth (1977) are45 Saturday, August 17, 13 the classical ones without using a heuristic function, that is, they assume the cost from the
- 46.原理和算法的更多细节 CCF互联网大数据与机器学习讲习班 Structured Prediction在自然语言处理中的应用http://mobvoi-resource.oss.aliyuncs.com/ccf2013_noannimation_.pdf46 Saturday, August 17, 13
- 47.为什么机器翻译算法很 杂? 47 Saturday, August 17, 13
- 48.解码器的 杂性:分割的 义 S->(S0 S1, S0 S1) S->(S0 S1, S0 S1) S->(机器, machine) S->(翻译, translation) S->(软件, software) S->(S0 S1, S0 S1) S->(S0 S1, S0 S1) S->(机器, machine) S->(翻译, translation) S->(软件, software) S->(S0 翻译 S1, S0 translation S1) S->(机器, machine) 翻译 S->(软件, software) 48 Saturday, August 17, 13
- 49.解码器的 杂性:翻译的 义 S->(S0 S1, S0 S1) S->(S0 S1, S0 S1) S->(机器, machine) S->(翻译, translation) S->(软件, software) machine translation software S->(S0 S1, S0 S1) S->(S0 S1, S0 S1) S->(机器, machine) S->(翻译, transfer) S->(软件, software) machine transfer software 49 Saturday, August 17, 13
- 50.解码器的 杂性:排序的 义 S->(S0 翻译 S1, S0 translation S1) S->(机器, machine) 翻译 S->(软件, software) machine translation software S->(S0 翻译 S1, S1 translation S0) S->(机器, machine) 翻译 S->(软件, software) software translation machine 50 Saturday, August 17, 13
- 51.解码器的 • 要考虑各 给定一个句子,解码过程 ‣ 分割的 • • 杂性 义 ‣ 翻译的 义 ‣ 排序的 义 每一 义 所有的 义都可压缩在超图里! 义都会导致组合爆炸 穷举不可能,所以需要非常 杂的动态规 51 Saturday, August 17, 13
- 52.Outline • • • Google Translate 机器翻译for dummy 机器翻译基础理论和算法 ‣ 机器学习 ‣ 数据结构,模型,算法 • 工业界机器翻译系统实战 52 Saturday, August 17, 13
- 53.算法,数据,工具 • 一个成功的工业界翻译系统包含 核心算法 数据 支 工具 53 Saturday, August 17, 13
- 54.工具的重要性 ‣ 一切都应该工具化,自动化 ‣ 好架构和工具会大大加速迭代 (谷歌翻译系统可以在一天之内重新 训练所有语言,训练结果直接以Email 的形式发给训练者) 54 Saturday, August 17, 13
- 55.为什么是Google? • IBM Research是许多NLP核心算法的 • Microsoft Research拥有豪华的NLP科研团队 • 但Google第一个把翻译做成大规模互联网产品, 创者 为什么? 55 Saturday, August 17, 13
- 56.为什么是Google? • 为何Google第一个把翻译做成大规模互联网产品? ‣ 团队基因:科学家+工程师 ‣ 整个谷歌大环境:实用至上 ‣ 大数据:中英系统用几千万对句子 ‣ 云架构:GFS, Map-reduce, Big-table • 很多类似的故事正在上演 语音识 图像识 句法解析 深度学习 知识图谱 对话搜索 56 Saturday, August 17, 13
- 57.打造你自己的Google Translate? 57 Saturday, August 17, 13
- 58.后端系统:10人 翻译模型(1) 数据处理(2) 语言模型(1) 解码器(1) 工具和架构(3) 区分训练(1) NLP基础模块(1) 58 Saturday, August 17, 13
- 59.产品 (16人) 推广运营(2) 产品经理(2) 前端 后端(10) 发(2) 59 Saturday, August 17, 13
- 60.创业公司的捷径? 源软件 整套: Moses Joshua NLP工具: Stanford NLP 语言模型: 云计算: 机器学习: CDec Berkeley Parser SRILM Hadoop CRF++ libSVM 60 Saturday, August 17, 13
- 61.Outline • • • Google Translate 机器翻译for dummy 机器翻译基础理论和算法 ‣ 机器学习 ‣ 数据结构,模型,算法 • 工业界机器翻译系统实战 把机器翻译换成NLP!! 61 Saturday, August 17, 13
- 62.• 微信公号:出 问问 打造Google Now的中文版 • 招聘: www.mobvoi.com 打造中国的Google 62 Saturday, August 17, 13
- 63.Thank you! XieXie! 谢谢! 63 Saturday, August 17, 13