[英]Select only 'NN' and 'VB' words from NTLK pos_tag
I need to print only 'NN' and 'VB' words from an entered sentence. 我只需要从输入的句子中打印“ NN”和“ VB”单词。
import nltk
import re
import time
var = raw_input("Please enter something: ")
exampleArray = [var]
def processLanguage():
try:
for item in exampleArray:
tokenized = nltk.word_tokenize(item)
tagged = nltk.pos_tag(tokenized)
print tagged
time.sleep(555)
except Exception, e:
print str(e)
processLanguage()
How about changing 如何改变
print tagged
to 至
print [(word, tag) for word, tag in tagged if tag in ('NN', 'VB')]
You might need to use the first 2 characters of the POS tag, see NLTK - Get and Simplify List of Tags 您可能需要使用POS标签的前2个字符,请参阅NLTK-获取和简化标签列表
nn_vb_tagged = [(word,tag) for word, tag in tagged
if tag.startswith('NN') or tag.startswith('VB')]
You can try this: 您可以尝试以下方法:
example = "This is a sample sentence, showing off the stop words filtration.!"
word_tokens = word_tokenize(example)
pos = nltk.pos_tag(word_tokens)
selective_pos = ['NN','VB']
selective_pos_words = []
for word,tag in pos:
if tag in selective_pos:
selective_pos_words.append((word,tag))
print(selective_pos_words)
By adding your selective parts of speech in the list "selective_pos", you can select any of your preferable word. 通过在列表“ selective_pos”中添加您选择的词性,您可以选择任何您喜欢的单词。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.