如何识别 NLTK 树类型的对象然后解析它？

Question

I am trying to get GPE locations from a message after tokenizing it.我试图在标记化消息后从消息中获取 GPE 位置。

from nltk import ne_chunk 

print(ne_chunk(pos_words[0]))

Output:输出：

  Weather/NNP
  update/VB
  a/DT
  cold/JJ
  front/NN
  from/IN
  (GPE Cuba/NNP)
  that/WDT
  could/MD
  pass/VB
  over/RP
  (PERSON Haiti/NNP))

I want to get the output Cuba as a string.我想将输出 Cuba 作为一个字符串。 How can I access that?我怎样才能访问它？

Edit: I am trying to generalize the extraction to the dataframe by making a list of locations.编辑：我试图通过制作位置列表来将提取推广到数据框。 This is the function I made.这是我做的功能。 However, it splits multi-word locations like New York into [New, York]但是，它将像纽约这样的多词位置拆分为 [New, York]

    locations = []
    for i in range(len(pos_words)): 
        chunks = ne_chunk(pos_words[i]) 
        for c in chunks: 
            if isinstance(c, Tree) and c.label() == 'GPE': 
                # The object is <class 'nltk.tree.Tree'> and label is Geopolitical Entity
                locations.extend([w for w,_ in c.leaves()])
    
    return locations

Answer 1

import nltk
from nltk import Tree

text = 'Weather update a cold front from Cuba that could pass over Hatti'
# Tokenize and tag
pos_words = nltk.pos_tag(nltk.word_tokenize(text))
# Named entity chunker
chunks = nltk.ne_chunk(pos_words)
for c in chunks:
    if isinstance(c, Tree) and c.label() == 'GPE':
        # The object is <class 'nltk.tree.Tree'> and label is Geopolitical Entity
        print(' '.join([w for w, _ in c.leaves()]))

如何识别 NLTK 树类型的对象然后解析它？

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-08-31 05:18:50

如何识别 NLTK 树类型的对象然后解析它？

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-08-31 05:18:50

解决方案1
1 已采纳 2020-08-31 05:18:50