[英]nltk NER word extraction
I have checked previous related threads, but did not solve my issue. 我检查了以前的相关主题,但没有解决我的问题。 I have written code to get NER from text.
我编写了代码来从文本中获取NER。
text = "Stallone jason's film Rocky was inducted into the National Film Registry as well as having its film props placed in the Smithsonian Museum."
tokenized = nltk.word_tokenize(text)
tagged = nltk.pos_tag(tokenized)
namedEnt = nltk.ne_chunk(tagged, binary = True)
print namedEnt
namedEnt = nltk.ne_chunk(tagged, binary = False)
which gives this short of result 这给出了这么短的结果
(S
(NE Stallone/NNP)
jason/NN
's/POS
film/NN
(NE Rocky/NNP)
was/VBD
inducted/VBN
into/IN
the/DT
(NE National/NNP Film/NNP Registry/NNP)
as/IN
well/RB
as/IN
having/VBG
its/PRP$
film/NN
props/NNS
placed/VBN
in/IN
the/DT
(NE Smithsonian/NNP Museum/NNP)
./.)
while I expect only NE as a result, like 虽然我期望只有NE作为结果,如
Stallone
Rockey
National Film Registry
Smithsonian Museum
how to achieve this? 怎么实现这个?
UPDATE UPDATE
result = ' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.node == "NE"
print result
gives syntext error, what is correct way to write this? 给出了syntext错误,写这个的正确方法是什么?
UPDATE2 UPDATE2
text = "Stallone jason's film Rocky was inducted into the National Film Registry as well as having its film props placed in the Smithsonian Museum." text =“史泰龙杰森的电影洛基被引入国家电影注册处,并将其电影道具放在史密森尼博物馆。”
tokenized = nltk.word_tokenize(text)
tagged = nltk.pos_tag(tokenized)
namedEnt = nltk.ne_chunk(tagged, binary = True)
print namedEnt
np = [' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.node == "NE"]
print np
error: 错误:
np = [' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.node == "NE"]
File "/usr/local/lib/python2.7/dist-packages/nltk/tree.py", line 198, in _get_node
raise NotImplementedError("Use label() to access a node label.")
NotImplementedError: Use label() to access a node label.
so I tried with 所以我尝试过
np = [' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.label() == "NE"]
which gives emtpy result 这给了emtpy结果
The namedEnt
returned is actually a Tree
object which is a subclass of list
. 返回的
namedEnt
实际上是一个Tree
对象,它是list
的子类。 You can do the following to parse it: 您可以执行以下操作来解析它:
[' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.node == "NE"]
Output: 输出:
['Stallone', 'Rocky', 'National Film Registry', 'Smithsonian Museum']
The binary
flag is set to True
will indicate only whether a subtree is NE or not, which is what we need above. binary
标志设置为True
将仅指示子树是否为NE,这是我们上面需要的。 When set to False
it will give more information like whether the NE is an Organization, Person etc. For some reason, the result with flag On and Off don't seem to agree with one another. 当设置为
False
,它将提供更多信息,例如NE是组织,人员等。出于某种原因,标志为On和Off的结果似乎彼此不一致。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.