[英]Unable to eliminate stop words using nltk from a sequence of words
我有一個單詞序列,我想使用nltk從中消除所有停用詞。 相同的代碼段如下所示:
#tokensgenerated has the sequence of words
for word in tokensgenerated:
if(word not in nltk.corpus.stopwords.words('english')):
#do something with the word
但是,我遇到了運行時錯誤。
“除了LookupError:提高e”
我已經導入了nltk。
我有什么想念的嗎?
首先下載並確保您的stopwords
已下載,請參見http://www.nltk.org/data :
>>> import nltk
>>> packages = ['stopwords']
>>> downloader.download(packages)
>>>
>>> stop = stopwords.words('english')
>>> sent = 'this is a foobar sentence'.split()
>>> [word for word in sent if word not in stop]
['foobar', 'sentence']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.