簡體   English   中英

如何從NLTK樹獲取令牌?

[英]How to get the tokens from an NLTK Tree?

所以我把這棵樹還給我

Tree('S', [('This', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('test', 'NN'), (',', ','), Tree('PERSON', [('Stackoverflow', 'NNP'), ('Users', 'NNP')]), ('.', '.')])

我可以把它變成一個像這樣的漂亮的python列表

sentence = "This is a test, Stackoverflow Users."
tokens = nltk.word_tokenize(sentence)
tagged = nltk.pos_tag(tokens)
entities = nltk.chunk.ne_chunk(tagged)
tree = repr(entities) # THIS VARIABLE IS THE TREE THAT IS RETURNED TO ME
# below this point it's about turning the tree into a python list
tree = (("[" + tree[5:-1] + "]")).replace("Tree", "").replace(")", "]").replace("(", "[")
tree = ast.literal_eval(tree) #you'll need to import ast (included with python)

現在,樹變量是這樣的:

['S', [['This', 'DT'], ['is', 'VBZ'], ['a', 'DT'], ['test', 'NN'], [',', ','], ['ORGANIZATION', [['Stackoverflow', 'NNP']]], ['users', 'NNS'], ['.', '.']]]

當我嘗試遍歷並獲得一串句子時,我得到

"This is a test, ORGANIZATION."

而不是期望的

"This is a test, Stackoverflow users."

我不能簡單地使用句子變量,我需要能夠從列表列表中獲取句子。 任何代碼段或建議將不勝感激。

>>> from nltk import Tree
>>> yourtree = Tree('S', [('This', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('test', 'NN'), (',', ','), Tree('PERSON', [('Stackoverflow', 'NNP'), ('Users', 'NNP')]), ('.', '.')])
>>> yourtree.leaves()
[('This', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('test', 'NN'), (',', ','), ('Stackoverflow', 'NNP'), ('Users', 'NNP'), ('.', '.')]
>>> tokens, pos = zip(*yourtree.leaves())
>>> tokens
('This', 'is', 'a', 'test', ',', 'Stackoverflow', 'Users', '.')
>>> pos
('DT', 'VBZ', 'DT', 'NN', ',', 'NNP', 'NNP', '.')

另請參閱: 如何遍歷NLTK樹對象?

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM