简体   繁体   中英

NLTK Entity Extraction Difference from NLTK 2.0.4 to NLTK 3.0

I'm running into an issue trying to run an entity extraction function. I believe it's a versioning difference. The following working example runs in 2.0.4 but does not run in 3.0. I did change one function call: batch_ne_chunk to: nltk.ne_chunk_sents to prevent an error being thrown in 3.0.

def package_get_entities(self,text):
    #text = text[0:300]
    entity_names = []
    chunked = self.get_chunked_sentences(text)
    for tree in chunked:
        entity_names.extend(self.extract_entity_names(tree))
    entity_names = list(set(entity_names))
    return entity_names

def get_chunked_sentences(self,text):
    sentences = nltk.sent_tokenize(text)
    tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
    tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
    chunked_sentences = nltk.ne_chunk_sents(tagged_sentences, binary=True)
    return chunked_sentences

def extract_entity_names(self,t):
    entity_names = []
    if hasattr(t, 'node') and t.node:
        if t.node == 'NE':
            entity_names.append(' '.join([child[0] for child in t]))
        else:
            for child in t:
                entity_names.extend(self.extract_entity_names(child))
    return entity_names

Running the func:

str = 'this is some text about a man named Abraham Lincoln'
entArray = package_get_entities(str)

In 2.0.4 outputs [Abraham Lincoln] In 3.0 outputs []

I had to rewrite:

if hasattr(t, 'node') and t.node:

To:

if hasattr(t, 'label'):

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM