如何通过nltk在Python中从Tree类型转换为String类型？

Question

for subtree3 in tree.subtrees():
  if subtree3.label() == 'CLAUSE':
    print(subtree3)
    print subtree3.leaves()

Using this code I able to extract the leaves of the tree. 使用此代码，我能够提取树的叶子。 Which are: [('talking', 'VBG'), ('constantly', 'RB')] for a certain example. 对于某个例子[('talking', 'VBG'), ('constantly', 'RB')]它们是： [('talking', 'VBG'), ('constantly', 'RB')] 。 That is perfectly correct. 这是完全正确的。 Now I want this Tree elements to convert into string or in list for some further processing. 现在我想要将这些Tree元素转换为字符串或列表以进行进一步处理。 How can I do that? 我怎样才能做到这一点？

What I tried 我尝试了什么

for subtree3 in tree.subtrees():
  if subtree3.label() == 'CLAUSE':
    print(subtree3)
    print subtree3.leaves()
    fo.write(subtree3.leaves())
fo.close()

But it throws an error : 但它抛出一个错误：

Traceback (most recent call last):
  File "C:\Python27\Association_verb_adverb.py", line 35, in <module>
    fo.write(subtree3.leaves())
TypeError: expected a character buffer object

I just want to store the leaves in a text file. 我只想将叶子存储在文本文件中。

Answer 1

It depends on your version of NLTK and Python. 这取决于您的NLTK和Python版本。 I think you're referencing the Tree class in the nltk.tree module. 我认为你在nltk.tree模块中引用了Tree类。 If so, read on. 如果是这样，请继续阅读。

In your code, it's true that: 在您的代码中，确实如此：

subtree3.leaves() returns a "list of tuple" object and, subtree3.leaves()返回“元组列表”对象，
fo is a Python File IO object , the fo.write only receives a str type as a parameters fo是一个Python File IO对象， fo.write只接收一个str类型作为参数

you can simply print the tree leaves with fo.write(str(subtree3.leaves())) , thus: 你可以用fo.write(str(subtree3.leaves()))打印树叶，因此：

for subtree3 in tree.subtrees():
    if subtree3.label() == 'CLAUSE':
        print(subtree3)
        print subtree3.leaves()
        fo.write(str(subtree3.leaves()))
fo.flush()
fo.close()

and don't forget to flush() the buffer. 并且不要忘记flush()缓冲区。

Answer 2

Possibly the question is more of trying to write a list of tuples to files instead of traversing the NLTK Tree object. 可能问题更多的是尝试将元组列表写入文件而不是遍历NLTK Tree对象。 See NLTK: How do I traverse a noun phrase to return list of strings? 请参阅NLTK：如何遍历名词短语以返回字符串列表？ and Unpacking a list / tuple of pairs into two lists / tuples 并将列表/元组解包为两个列表/元组

To output a list of tuples of 2 strings, I find it useful to use this idiom: 要输出2个字符串的元组列表，我发现使用这个成语很有用：

fout = open('outputfile', 'w')

listoftuples = [('talking', 'VBG'), ('constantly', 'RB')]
words, tags = zip(*listoftuples)

fout.write(' '.join(words) + '\t' + ' '.join(tags) + '\n')

But the zip(*list) code might not work if there are multiple levels in your subtrees. 但是如果子树中有多个级别，则zip(*list)代码可能不起作用。

如何通过nltk在Python中从Tree类型转换为String类型？

问题描述

2 个解决方案

解决方案1
4 已采纳 2015-11-25 06:15:25

解决方案2
3 2015-11-25 09:19:46

如何通过nltk在Python中从Tree类型转换为String类型？

问题描述

2 个解决方案

解决方案1 4 已采纳 2015-11-25 06:15:25

解决方案2 3 2015-11-25 09:19:46

解决方案1
4 已采纳 2015-11-25 06:15:25

解决方案2
3 2015-11-25 09:19:46