Spacy 提取特定名词短语

Question

Can I use spacy in python to find NP with specific neighbors?我可以在 python 中使用 spacy 找到特定邻居的 NP 吗？ I want Noun phrases from my text that has verb before and after it.我想要我的文本中前后有动词的名词短语。

Answer 1

You can merge the noun phrases ( so that they do not get tokenized seperately).您可以合并名词短语（这样它们就不会被单独标记）。

Analyse the dependency parse tree, and see the POS of neighbouring tokens.分析依赖解析树，查看相邻token的POS。

 >>> import spacy >>> nlp = spacy.load('en') >>> sent = u'run python program run, to make this work' >>> parsed = nlp(sent) >>> list(parsed.noun_chunks) [python program] >>> for noun_phrase in list(parsed.noun_chunks): ... noun_phrase.merge(noun_phrase.root.tag_, noun_phrase.root.lemma_, noun_phrase.root.ent_type_) ... python program >>> [(token.text,token.pos_) for token in parsed] [(u'run', u'VERB'), (u'python program', u'NOUN'), (u'run', u'VERB'), (u',', u'PUNCT'), (u'to', u'PART'), (u'make', u'VERB'), (u'this', u'DET'), (u'work', u'NOUN')]

By analysing the POS of adjacent tokens, you can get your desired noun phrases.通过分析相邻标记的 POS，您可以获得您想要的名词短语。
A better approach would be to analyse the dependency parse tree , and see the lefts and rights of the noun phrase, so that even if there is a punctuation or other POS tag between the noun phrase and verb, you can increase your search coverage更好的方法是分析依赖解析树，并查看名词短语的左右，这样即使名词短语和动词之间有标点符号或其他词性标记，也可以增加搜索范围

Answer 2

From https://spacy.io/usage/linguistic-features#dependency-parse来自https://spacy.io/usage/linguistic-features#dependency-parse

You can use Noun chunks .您可以使用Noun chunks 。 Noun chunks are "base noun phrases" – flat phrases that have a noun as their head.名词块是“基本名词短语”——以名词为中心的扁平短语。 You can think of noun chunks as a noun plus the words describing the noun – for example, "the lavish green grass" or "the world's largest tech fund".您可以将名词块视为名词加上描述该名词的词 - 例如，“繁茂的绿草”或“世界上最大的科技基金”。 To get the noun chunks in a document, simply iterate over Doc.noun_chunks .要获取文档中的名词块，只需遍历Doc.noun_chunks 。

In:
        import spacy
        nlp = spacy.load('en_core_web_sm')
        doc = nlp(u"Autonomous cars shift insurance liability toward manufacturers")
        for chunk in doc.noun_chunks:
            print(chunk.text)

Out:

        Autonomous cars
        insurance liability
        manufacturers

Answer 3

If you want to re-tokenize using merge phrases, I prefer this (rather than noun chunks) :如果你想使用合并短语重新标记，我更喜欢这个（而不是名词块）：

import spacy
nlp = spacy.load('en_core_web_sm')
nlp.add_pipe(nlp.create_pipe('merge_noun_chunks'))
doc = nlp(u"Autonomous cars shift insurance liability toward manufacturers")
for token in doc:
    print(token.text)

and the output will be :输出将是：

Autonomous cars
shift
insurance liability
toward
manufacturers

I choose this way because each token has property for further process :)我选择这种方式是因为每个令牌都有进一步处理的属性:)

Spacy 提取特定名词短语

问题描述

3 个解决方案

解决方案1
13 2017-06-21 06:39:49

解决方案2
3 2018-09-04 22:16:37

解决方案3
1 2020-04-23 15:46:45

Spacy 提取特定名词短语

问题描述

3 个解决方案

解决方案1 13 2017-06-21 06:39:49

解决方案2 3 2018-09-04 22:16:37

解决方案3 1 2020-04-23 15:46:45

解决方案1
13 2017-06-21 06:39:49

解决方案2
3 2018-09-04 22:16:37

解决方案3
1 2020-04-23 15:46:45