简体   繁体   English

nltk NER字提取

[英]nltk NER word extraction

I have checked previous related threads, but did not solve my issue. 我检查了以前的相关主题,但没有解决我的问题。 I have written code to get NER from text. 我编写了代码来从文本中获取NER。

text = "Stallone jason's film Rocky was inducted into the National Film Registry as well as having its film props placed in the Smithsonian Museum."

tokenized = nltk.word_tokenize(text)
tagged = nltk.pos_tag(tokenized)
namedEnt = nltk.ne_chunk(tagged, binary = True)
print namedEnt
namedEnt = nltk.ne_chunk(tagged, binary = False)

which gives this short of result 这给出了这么短的结果

(S
  (NE Stallone/NNP)
  jason/NN
  's/POS
  film/NN
  (NE Rocky/NNP)
  was/VBD
  inducted/VBN
  into/IN
  the/DT
  (NE National/NNP Film/NNP Registry/NNP)
  as/IN
  well/RB
  as/IN
  having/VBG
  its/PRP$
  film/NN
  props/NNS
  placed/VBN
  in/IN
  the/DT
  (NE Smithsonian/NNP Museum/NNP)
  ./.)

while I expect only NE as a result, like 虽然我期望只有NE作为结果,如

Stallone
Rockey
National Film Registry
Smithsonian Museum

how to achieve this? 怎么实现这个?

UPDATE UPDATE

result = ' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.node == "NE"
print result

gives syntext error, what is correct way to write this? 给出了syntext错误,写这个的正确方法是什么?

UPDATE2 UPDATE2

text = "Stallone jason's film Rocky was inducted into the National Film Registry as well as having its film props placed in the Smithsonian Museum." text =“史泰龙杰森的电影洛基被引入国家电影注册处,并将其电影道具放在史密森尼博物馆。”

tokenized = nltk.word_tokenize(text)
tagged = nltk.pos_tag(tokenized)
namedEnt = nltk.ne_chunk(tagged, binary = True)
print namedEnt
np = [' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.node == "NE"]
print np

error: 错误:

 np = [' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.node == "NE"]
  File "/usr/local/lib/python2.7/dist-packages/nltk/tree.py", line 198, in _get_node
    raise NotImplementedError("Use label() to access a node label.")
NotImplementedError: Use label() to access a node label.

so I tried with 所以我尝试过

np = [' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.label() == "NE"]

which gives emtpy result 这给了emtpy结果

The namedEnt returned is actually a Tree object which is a subclass of list . 返回的namedEnt实际上是一个Tree对象,它是list的子类。 You can do the following to parse it: 您可以执行以下操作来解析它:

[' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.node == "NE"]

Output: 输出:

['Stallone', 'Rocky', 'National Film Registry', 'Smithsonian Museum']

The binary flag is set to True will indicate only whether a subtree is NE or not, which is what we need above. binary标志设置为True将仅指示子树是否为NE,这是我们上面需要的。 When set to False it will give more information like whether the NE is an Organization, Person etc. For some reason, the result with flag On and Off don't seem to agree with one another. 当设置为False ,它将提供更多信息,例如NE是组织,人员等。出于某种原因,标志为On和Off的结果似乎彼此不一致。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM