Python：使用 Spacy 等对名词短语（例如介词）进行分块

Question

Since I was told Spacy was such a powerful Python module for natural speech processing, I am now desperately looking for a way to group words together to more than noun phrases, most importantly, prepositional phrases.自从有人告诉我 Spacy 是用于自然语音处理的如此强大的 Python 模块，我现在正在拼命寻找一种方法来将单词组合在一起，而不仅仅是名词短语，最重要的是，介词短语。 I doubt there is a Spacy function for this but that would be the easiest way I guess (SpacySpaCy import is already implemented in my project).我怀疑是否有一个 Spacy 函数，但我猜这将是最简单的方法（SpacySpaCy 导入已在我的项目中实现）。 Nevertheless, I'm open for any possibility of phrase recognition/ chunking.尽管如此，我对短语识别/分块的任何可能性持开放态度。

Answer 1

Here's a solution to get PPs.这是获得PP的解决方案。 In general you can get phrases using subtree .通常，您可以使用subtree获取短语。

def get_pps(doc):
    "Function to get PPs from a parsed document."
    pps = []
    for token in doc:
        # Try this with other parts of speech for different subtrees.
        if token.pos_ == 'ADP':
            pp = ' '.join([tok.orth_ for tok in token.subtree])
            pps.append(pp)
    return pps

Usage:用法：

import spacy

nlp = spacy.load('en_core_web_sm')
ex = 'A short man in blue jeans is working in the kitchen.'
doc = nlp(ex)

print(get_pps(doc))

This prints:这打印：

['in blue jeans', 'in the kitchen']

Python：使用 Spacy 等对名词短语（例如介词）进行分块

问题描述

1 个解决方案

解决方案1
8 2017-10-29 11:25:34

Python：使用 Spacy 等对名词短语（例如介词）进行分块

问题描述

1 个解决方案

解决方案1 8 2017-10-29 11:25:34

解决方案1
8 2017-10-29 11:25:34