简体   繁体   English

如何在NLTK中重新格式化麦芽分析器的输出?

[英]How can I reformat the output of the Malt Parser in NLTK?

So I finally figured out how to use the malt wrapper provided in the NLTK from " How to use malt parser in python nltk " and was able to to chunk my sentences successfully, but my sentences come out in a format I'm unfamiliar with. 因此,我终于从“ 如何在python nltk中使用麦芽解析器 ”中找出了如何使用NLTK中提供的麦芽包装器,并且能够成功地对我的句子进行分块,但是我的句子却以我不熟悉的格式出现。

For example, parsing "This is a sentence" returns: 例如,解析“这是一个句子”将返回:

>>> import nltk
>>> parser = nltk.parse.malt.MaltParser(working_dir="/path/to/dir",mco="engmalt.linear-1.7",additional_java_args=['-Xmx512m'])
>>> txt = "This is a test sentence"
>>> graph = parser.raw_parse(txt)
>>> graph.tree().pprint()
(This (sentence is a test))

Parsing a more complex sentence returns: 解析更复杂的句子将返回:

>>> import nltk
>>> parser = nltk.parse.malt.MaltParser(working_dir="/path/to/dir",mco="engmalt.linear-1.7",additional_java_args=['-Xmx512m'])
>>> txt = "A ceasefire for east Ukraine has been agreed during talks in Minsk."
>>> graph = parser.raw_parse(txt)
>>> graph.tree().pprint()
(agreed
   (ceasefire A (for (Ukraine east)))
   has
   been
   (during (talks (in Minsk)))
   .)

Could someone please explain what this output format is or how I can parse it in such a way that makes it look like the original sentence: 有人可以解释一下此输出格式是什么,或者我如何以使其看起来像原始句子的方式来解析它:

(This (is a test sentence))
A (ceasefire (for (east Ukraine))) has been (agreed (during (talks (in Minsk))).)

If it helps, graph is an nltk DependencyGraph and graph.tree() is an nltk Tree. 如果有帮助, graph是nltk DependencyGraph, graph.tree()是nltk树。

Thanks in advance. 提前致谢。

MaltParser is a system for data-driven "dependency parsing", which can be used to induce a parsing model from treebank data and to parse new data using an induced model. MaltParser是用于数据驱动的“依赖性解析”的系统,可用于从树库数据中引入解析模型,并使用归纳模型来解析新数据。

The files engmalt.poly-1.7.mco and engmalt.linear-1.7.mco contain single malt configurations for parsing English text with MaltParser. engmalt.poly-1.7.mco和engmalt.linear-1.7.mco文件包含单个麦芽配置,用于使用MaltParser解析英文文本。

The two models differ in that engmalt.poly-1.7.mco uses SVMs with a polynomial kernel for classification, while engmalt.linear-1.7.mco uses linear SVMs. 两种模型的区别在于engmalt.poly-1.7.mco使用带有多项式核的SVM进行分类,而engmalt.linear-1.7.mco使用线性SVM。 While the latter parser is much faster, the former requires less memory, and parsing accuracy is similar for the two models. 尽管后者的解析器要快得多,但是前者需要的内存更少,并且两种模型的解析精度相似。 And also the way our output parsed texts are written. 以及我们的输出解析文本的编写方式。

With engmalt.poly-1.7.mco, output parsed text are represented in dependency annotation/ dependency graphs where engmalt.linear-1.7.mco represents in linear way. 使用engmalt.poly-1.7.mco,输出的解析文本在依赖项注释/依赖关系图中表示,其中engmalt.linear-1.7.mco以线性方式表示。

Please follow the below outputs. 请遵循以下输出。 Hope this helps. 希望这可以帮助。

With mco="engmalt.linear-1.7" 使用mco =“ engmalt.linear-1.7”

>>> import nltk
>>> parser = nltk.parse.malt.MaltParser(working_dir="/path/to/dir",mco="engmalt.linear-1.7",additional_java_args=['-Xmx512m'])
>>> txt = "This is a test sentence"
>>> graph = parser.raw_parse(txt)
>>> graph.tree().pprint()
(This (sentence is a test))

With mco="engmalt.poly-1.7" 使用mco =“ engmalt.poly-1.7”

>>> import nltk
>>> parser = nltk.parse.malt.MaltParser(working_dir="/path/to/dir",mco="engmalt.poly-1.7",additional_java_args=['-Xmx512m'])
>>> txt = "This is a test sentence"
>>> graph = parser.raw_parse(txt)
>>> graph.tree().pprint()
(is This (a (sentence test)))

For your new complex sentence, With mco="engmalt.linear-1.7" 对于新的复杂句子,请使用mco =“ engmalt.linear-1.7”

>>> import nltk
>>> parser = nltk.parse.malt.MaltParser(working_dir="/path/to/dir",mco="engmalt.linear-1.7",additional_java_args=['-Xmx512m'])
>>> txt = "A ceasefire for east Ukraine has been agreed during talks in Minsk."
>>> graph = parser.raw_parse(txt)
>>> graph.tree().pprint()
(A\n  (agreed\n    (been ceasefire for east Ukraine has)\n    (during (Minsk talks in)))\n  .)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM