[英]nltk NER word extraction
我檢查了以前的相關主題,但沒有解決我的問題。 我編寫了代碼來從文本中獲取NER。
text = "Stallone jason's film Rocky was inducted into the National Film Registry as well as having its film props placed in the Smithsonian Museum."
tokenized = nltk.word_tokenize(text)
tagged = nltk.pos_tag(tokenized)
namedEnt = nltk.ne_chunk(tagged, binary = True)
print namedEnt
namedEnt = nltk.ne_chunk(tagged, binary = False)
這給出了這么短的結果
(S
(NE Stallone/NNP)
jason/NN
's/POS
film/NN
(NE Rocky/NNP)
was/VBD
inducted/VBN
into/IN
the/DT
(NE National/NNP Film/NNP Registry/NNP)
as/IN
well/RB
as/IN
having/VBG
its/PRP$
film/NN
props/NNS
placed/VBN
in/IN
the/DT
(NE Smithsonian/NNP Museum/NNP)
./.)
雖然我期望只有NE作為結果,如
Stallone
Rockey
National Film Registry
Smithsonian Museum
怎么實現這個?
UPDATE
result = ' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.node == "NE"
print result
給出了syntext錯誤,寫這個的正確方法是什么?
UPDATE2
text =“史泰龍傑森的電影洛基被引入國家電影注冊處,並將其電影道具放在史密森尼博物館。”
tokenized = nltk.word_tokenize(text)
tagged = nltk.pos_tag(tokenized)
namedEnt = nltk.ne_chunk(tagged, binary = True)
print namedEnt
np = [' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.node == "NE"]
print np
錯誤:
np = [' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.node == "NE"]
File "/usr/local/lib/python2.7/dist-packages/nltk/tree.py", line 198, in _get_node
raise NotImplementedError("Use label() to access a node label.")
NotImplementedError: Use label() to access a node label.
所以我嘗試過
np = [' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.label() == "NE"]
這給了emtpy結果
返回的namedEnt
實際上是一個Tree
對象,它是list
的子類。 您可以執行以下操作來解析它:
[' '.join([y[0] for y in x.leaves()]) for x in namedEnt.subtrees() if x.node == "NE"]
輸出:
['Stallone', 'Rocky', 'National Film Registry', 'Smithsonian Museum']
binary
標志設置為True
將僅指示子樹是否為NE,這是我們上面需要的。 當設置為False
,它將提供更多信息,例如NE是組織,人員等。出於某種原因,標志為On和Off的結果似乎彼此不一致。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.