如何使用 NLTK ne_chunk 提取 GPE（位置）？

Question

我正在嘗試使用 OpenWeatherMap API 和 NLTK 來實現一個代碼來檢查特定區域的天氣狀況，以查找實體名稱識別。 但是我無法找到將 GPE 中存在的實體（提供位置）（在本例中為芝加哥）傳遞給我的 API 請求的方法。 請幫助我使用語法。下面給出的代碼。

感謝您的幫助

import nltk
from nltk import load_parser
import requests
import nltk
from nltk import word_tokenize
from nltk.corpus import stopwords

sentence = "What is the weather in Chicago today? "
tokens = word_tokenize(sentence)

stop_words = set(stopwords.words('english'))

clean_tokens = [w for w in tokens if not w in stop_words]

tagged = nltk.pos_tag(clean_tokens)

print(nltk.ne_chunk(tagged))

Answer 1

GPE是來自預先訓練的ne_chunk模型的Tree對象的標簽。

>>> from nltk import word_tokenize, pos_tag, ne_chunk
>>> sent = "What is the weather in Chicago today?"
>>> ne_chunk(pos_tag(word_tokenize(sent)))
Tree('S', [('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'), ('weather', 'NN'), ('in', 'IN'), Tree('GPE', [('Chicago', 'NNP')]), ('today', 'NN'), ('?', '.')])

要遍歷樹，請參閱如何遍歷 NLTK 樹對象？

也許，您正在尋找對NLTK 命名實體識別到 Python 列表稍作修改的東西

from nltk import word_tokenize, pos_tag, ne_chunk
from nltk import Tree

def get_continuous_chunks(text, label):
    chunked = ne_chunk(pos_tag(word_tokenize(text)))
    prev = None
    continuous_chunk = []
    current_chunk = []

    for subtree in chunked:
        if type(subtree) == Tree and subtree.label() == label:
            current_chunk.append(" ".join([token for token, pos in subtree.leaves()]))
        if current_chunk:
            named_entity = " ".join(current_chunk)
            if named_entity not in continuous_chunk:
                continuous_chunk.append(named_entity)
                current_chunk = []
        else:
            continue

    return continuous_chunk

[出]：

>>> sent = "What is the weather in New York today?"
>>> get_continuous_chunks(sent, 'GPE')
['New York']

>>> sent = "What is the weather in New York and Chicago today?"
>>> get_continuous_chunks(sent, 'GPE')
['New York', 'Chicago']

>>> sent = "What is the weather in New York"
>>> get_continuous_chunks(sent, 'GPE')
['New York']

>>> sent = "What is the weather in New York and Chicago"
>>> get_continuous_chunks(sent, 'GPE')
['New York', 'Chicago']

Answer 2

這是解決方案，我想針對您的情況提出建議：

Step 1. Word_tokenize,POS_tagging,Name 實體識別：代碼是這樣的：

    Xstring = "What is the weather in New York and Chicago today?"

    tokenized_doc  = word_tokenize(Xstring)
    tagged_sentences = nltk.pos_tag(tokenized_doc )
    NE= nltk.ne_chunk(tagged_sentences )
    NE.draw()

步驟 2. 名稱實體識別后提取所有命名實體（如上）

    named_entities = []
    for tagged_tree in NE:
       print(tagged_tree)
       if hasattr(tagged_tree, 'label'):
          entity_name = ' '.join(c[0] for c in tagged_tree.leaves()) #
          entity_type = tagged_tree.label() # get NE category
          named_entities.append((entity_name, entity_type))

     print(named_entities)  #all entities will be printed,check at your end once

步驟 3.現在只提取 GPE 標簽

   for tag in named_entities:
      #print(tag[1])
      if tag[1]=='GPE':   #Specify any tag which is required
        print(tag)

這是我的輸出：

  ('New York', 'GPE')
  ('Chicago', 'GPE')

如何使用 NLTK ne_chunk 提取 GPE（位置）？

問題描述

2 個解決方案

解決方案1
4 已采納 2018-02-08 01:46:58

解決方案2
2 2018-12-27 11:47:23

如何使用 NLTK ne_chunk 提取 GPE（位置）？

問題描述

2 個解決方案

解決方案1 4 已采納 2018-02-08 01:46:58

解決方案2 2 2018-12-27 11:47:23

解決方案1
4 已采納 2018-02-08 01:46:58

解決方案2
2 2018-12-27 11:47:23