nltk NER字提取

Question

我檢查了以前的相關主題，但沒有解決我的問題。 我編寫了代碼來從文本中獲取NER。

text = "Stallone jason's film Rocky was inducted into the National Film Registry as well as having its film props placed in the Smithsonian Museum."

tokenized = nltk.word_tokenize(text)
tagged = nltk.pos_tag(tokenized)
namedEnt = nltk.ne_chunk(tagged, binary = True)
print namedEnt
namedEnt = nltk.ne_chunk(tagged, binary = False)

這給出了這么短的結果

(S
  (NE Stallone/NNP)
  jason/NN
  's/POS
  film/NN
  (NE Rocky/NNP)
  was/VBD
  inducted/VBN
  into/IN
  the/DT
  (NE National/NNP Film/NNP Registry/NNP)
  as/IN
  well/RB
  as/IN
  having/VBG
  its/PRP$
  film/NN
  props/NNS
  placed/VBN
  in/IN
  the/DT
  (NE Smithsonian/NNP Museum/NNP)
  ./.)

雖然我期望只有NE作為結果，如

Stallone
Rockey
National Film Registry
Smithsonian Museum

怎么實現這個？

UPDATE

result = ' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.node == "NE"
print result

給出了syntext錯誤，寫這個的正確方法是什么？

UPDATE2

text =“史泰龍傑森的電影洛基被引入國家電影注冊處，並將其電影道具放在史密森尼博物館。”

tokenized = nltk.word_tokenize(text)
tagged = nltk.pos_tag(tokenized)
namedEnt = nltk.ne_chunk(tagged, binary = True)
print namedEnt
np = [' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.node == "NE"]
print np

錯誤：

 np = [' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.node == "NE"]
  File "/usr/local/lib/python2.7/dist-packages/nltk/tree.py", line 198, in _get_node
    raise NotImplementedError("Use label() to access a node label.")
NotImplementedError: Use label() to access a node label.

所以我嘗試過

np = [' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.label() == "NE"]

這給了emtpy結果

Answer 1

返回的namedEnt實際上是一個Tree對象，它是list的子類。 您可以執行以下操作來解析它：

[' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.node == "NE"]

輸出：

['Stallone', 'Rocky', 'National Film Registry', 'Smithsonian Museum']

binary標志設置為True將僅指示子樹是否為NE，這是我們上面需要的。 當設置為False ，它將提供更多信息，例如NE是組織，人員等。出於某種原因，標志為On和Off的結果似乎彼此不一致。

nltk NER字提取

問題描述

1 個解決方案

解決方案1
3 已采納 2014-11-11 11:34:41

nltk NER字提取

問題描述

1 個解決方案

解決方案1 3 已采納 2014-11-11 11:34:41

解決方案1
3 已采納 2014-11-11 11:34:41