简体   繁体   中英

How to convert a Text object from a parsetree output of module Pattern in python?

I have a list of words like this:

['Urgente', 'Recibimos', 'Info']

I used the parsetree (parsetree(x, lemmata = True) function to convert the words and the output for each Word is this:

[[Sentence('urgente/JJ/B-ADJP/O/urgente')],
[Sentence('recibimos/NN/B-NP/O/recibimos')],
[Sentence('info/NN/B-NP/O/info')]]

Each component of the list has the type pattern.text.tree.Text .

I need to obtain only the group of words into the parenthesis but I don´t know how to do this, I need this output:

[urgente/JJ/B-ADJP/O/urgente,
recibimos/NN/B-NP/O/recibimos,
info/NN/B-NP/O/info]

I use str to convert to string each component to the list but this changes all output.

From their documentation , there doesn't seem to be a direct method or property to get what you want.

But I found that a Sentence object can be printed as Sentence('urgente/JJ/B-ADJP/O/urgente') using repr . So I looked at the source code for the __repr__ implementation to see how it is formed:

def __repr__(self):
    return "Sentence(%s)" % repr(" ".join(["/".join(word.tags) for word in self.words]))

It seems that the string "in parenthesis" is a combination of words and tags. You can then reuse that code, knowing that if you already have pattern.text.tree.Text objects, " a Text is a list of Sentence objects. Each Sentence is a list of Word objects. " (from the Parse trees documentation ).

So here's my hacky solution:

parsed = list()
for data in ['Urgente', 'Recibimos', 'Info']:
    parsed.append(parsetree(data, lemmata=True))

output = list()
for text in parsed:
    for sentence in text:
        formatted = " ".join(["/".join(word.tags) for word in sentence.words])
        output.append(str(formatted))

print(output)

Printing output gives:

['Urgente/NNP/B-NP/O/urgente', 'Recibimos/NNP/B-NP/O/recibimos', 'Info/NNP/B-NP/O/info']

Note that this solution results in a list of str s (losing all the properties/methods from the original parsetree output).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM