简体   繁体   中英

python NLTK parse subtree

two questions regarding trees in NLTK:

  1. can I differentiate in one tree (sentence) the first, second, ... subtree?
  2. how can I work with the tags in the leaves of the subtree?

The following code works good,

          for subtree in tree.subtrees(filter=lambda t: t.node == 'NP'):
            for attributes in subtree.leaves():
                print attributes

but it returns a list with words and labels:

('noun', 'NN')
('verb', VBZ)

and so on: I need to differentiate between the different types of words within a subtree. The subtree.labels() doesnt exist.

Something like:

           for subtree in tree.subtrees(filter=lambda t: t.node == 'NP'):
            for attributes in subtree.leaves():
                if subtree.labels() == 'NN':
                  # do something with the nouns...

Thanks for the hint

So I did it with python. Anyhow, if someone has a better idea...

         for subtree in tree.subtrees(filter=lambda t: t.node == 'NP' or t.node == 'NNS'):
            for attributes in subtree.leaves():
                (expression, tag) = attributes
                if tag == 'NN':
                    # do something with the nouns

I did something like the following to extract the noun phrases from the tree.

from itertools import groupby
[' '.join([t[0] for t,m in group]) for key, group in groupby(tree.pos(), lambda s: s[-1]=='NP') if key]

More generally, we can examine what is inside the 'group' and do whatever we want to the element into the group. For example,

[list(group]) for key, group in groupby(tree.pos(), lambda s: s[-1]=='NP') if key]

Once we know what the element within the 'list(group)' contains, we can do whatever want to with it.

Another way is to use the tree2conlltags. For example,

from nltk.chunk import tree2conlltags
from itertools import groupby

chunks = tree2conlltags(tree)

print(chunks)

results = [' '.join(word for word, pos, chunk in group).lower() for key, group in groupby(chunks, lambda s: s[-1]!='O') if key]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM