簡體   English   中英

如何識別 NLTK 樹類型的對象然后解析它?

[英]How do I identify an object is of type NLTK Tree and then parse it?

我試圖在標記化消息后從消息中獲取 GPE 位置。

from nltk import ne_chunk 

print(ne_chunk(pos_words[0])) 

輸出:

  Weather/NNP
  update/VB
  a/DT
  cold/JJ
  front/NN
  from/IN
  (GPE Cuba/NNP)
  that/WDT
  could/MD
  pass/VB
  over/RP
  (PERSON Haiti/NNP))

我想將輸出 Cuba 作為一個字符串。 我怎樣才能訪問它?

編輯:我試圖通過制作位置列表來將提取推廣到數據框。 這是我做的功能。 但是,它將像紐約這樣的多詞位置拆分為 [New, York]

    locations = []
    for i in range(len(pos_words)): 
        chunks = ne_chunk(pos_words[i]) 
        for c in chunks: 
            if isinstance(c, Tree) and c.label() == 'GPE': 
                # The object is <class 'nltk.tree.Tree'> and label is Geopolitical Entity
                locations.extend([w for w,_ in c.leaves()])
    
    return locations
import nltk
from nltk import Tree

text = 'Weather update a cold front from Cuba that could pass over Hatti'
# Tokenize and tag
pos_words = nltk.pos_tag(nltk.word_tokenize(text))
# Named entity chunker
chunks = nltk.ne_chunk(pos_words)
for c in chunks:
    if isinstance(c, Tree) and c.label() == 'GPE':
        # The object is <class 'nltk.tree.Tree'> and label is Geopolitical Entity
        print(' '.join([w for w, _ in c.leaves()]))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM