简体   繁体   中英

Python: Chunking others than noun phrases (e.g. prepositional) using Spacy, etc

Since I was told Spacy was such a powerful Python module for natural speech processing, I am now desperately looking for a way to group words together to more than noun phrases, most importantly, prepositional phrases. I doubt there is a Spacy function for this but that would be the easiest way I guess (SpacySpaCy import is already implemented in my project). Nevertheless, I'm open for any possibility of phrase recognition/ chunking.

Here's a solution to get PPs. In general you can get phrases using subtree .

def get_pps(doc):
    "Function to get PPs from a parsed document."
    pps = []
    for token in doc:
        # Try this with other parts of speech for different subtrees.
        if token.pos_ == 'ADP':
            pp = ' '.join([tok.orth_ for tok in token.subtree])
            pps.append(pp)
    return pps

Usage:

import spacy

nlp = spacy.load('en_core_web_sm')
ex = 'A short man in blue jeans is working in the kitchen.'
doc = nlp(ex)

print(get_pps(doc))

This prints:

['in blue jeans', 'in the kitchen']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM