简体   繁体   中英

How to get all noun phrases in Spacy(Python)

I would like to extract "all" the noun phrases from a sentence. I'm wondering how I can do it. I have the following code:

doc2 = nlp("what is the capital of Bangladesh?")
for chunk in doc2.noun_chunks:
    print(chunk)

Output:

1. what

2. the capital

3. bangladesh

Expected:

the capital of Bangladesh

I have tried answers from spacy doc and StackOverflow. Nothing worked. It seems only cTakes and Stanford core NLP can give such complex NP.

Any help is appreciated.

Spacy clearly defines a noun chunk as:

A base noun phrase, or "NP chunk", is a noun phrase that does not permit other NPs to be nested within it – so no NP-level coordination, no prepositional phrases, and no relative clauses." ( https://spacy.io/api/doc#noun_chunks )

If you process the dependency parse differently, allowing prepositional modifiers and nested phrases/chunks, then you can end up with what you're looking for.

I bet you could modify the existing spacy code fairly easily to do what you want:

https://github.com/explosion/spaCy/blob/06c6dc6fbcb8fbb78a61a2e42c1b782974bd43bd/spacy/lang/en/syntax_iterators.py

For those who are still looking for this answer

noun_pharses=set()    
for nc in doc.noun_chunks:
    for np in [nc, doc[nc.root.left_edge.i:nc.root.right_edge.i+1]]:
       noun_pharses.add(np)

This is how I get all the complex noun phrase

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM