简体   繁体   English

如何从单词对齐中获取短语表?

[英]How to get phrase tables from word alignments?

The output of my word alignment file looks as such: 我的单词对齐文件的输出如下:

I wish to say with regard to the initiative of the Portuguese Presidency that we support the spirit and the political intention behind it . In bezug auf die Initiative der portugiesischen Präsidentschaft möchte ich zum Ausdruck bringen , daß wir den Geist und die politische Absicht , die dahinter stehen , unterstützen .   0-0 5-1 5-2 2-3 8-4 7-5 11-6 12-7 1-8 0-9 9-10 3-11 10-12 13-13 13-14 14-15 16-16 17-17 18-18 16-19 20-20 21-21 19-22 19-23 22-24 22-25 23-26 15-27 24-28
It may not be an ideal initiative in terms of its structure but we accept Mr President-in-Office , that it is rooted in idealism and for that reason we are inclined to support it .    Von der Struktur her ist es vielleicht keine ideale Initiative , aber , Herr amtierender Ratspräsident , wir akzeptieren , daß sie auf Idealismus fußt , und sind deshalb geneigt , sie mitzutragen .   0-0 11-2 8-3 0-4 3-5 1-6 2-7 5-8 6-9 12-11 17-12 15-13 16-14 16-15 17-16 13-17 14-18 17-19 18-20 19-21 21-22 23-23 21-24 26-25 24-26 29-27 27-28 30-29 31-30 33-31 32-32 34-33

How can I produce the phrase tables that are used by MOSES from this output? 如何从此输出中生成MOSES使用的短语表?

In this pdf, it explains the consistent phrase extraction: http://www.inf.ed.ac.uk/teaching/courses/mt/lectures/phrase-model.pdf but what is the algorithm to achieve the phrases ? 在这个pdf中,它解释了consistent phrase提取: http//www.inf.ed.ac.uk/teaching/courses/mt/lectures/phrase-model.pdf实现这些短语的算法是什么 (slide 16-21) (幻灯片16-21)

The way to get a phrase table is to first extract the phrase table with the following algorithm from Philip Koehn's Statistical MT book, pp. 133: 获取短语表的方法是首先使用Philip Koehn的统计MT书,第133页中的以下算法提取短语表:

在此输入图像描述

Then estimate the probabilities for the phrases with their relative frequencies, ie 然后用它们的相对频率估计短语的概率,即

在此输入图像描述

Note that there is an error in the original printed version of the book but it's addressed in the errata on line 4 of the extract() function. 请注意,本书的原始印刷版本中存在错误,但它在extract()函数的第4行的勘误表中进行了说明。

Also see Phrase extraction algorithm for statistical machine translation for the details. 另请参阅用于统计机器翻译的短语提取算法以获取详细信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM