無法使用nltk從單詞序列中消除停用詞

Question

我有一個單詞序列，我想使用nltk從中消除所有停用詞。 相同的代碼段如下所示：

#tokensgenerated has the sequence of words
for word in tokensgenerated:
    if(word not in nltk.corpus.stopwords.words('english')):
          #do something with the word

但是，我遇到了運行時錯誤。

“除了LookupError：提高e”

我已經導入了nltk。

我有什么想念的嗎？

Answer 1

首先下載並確保您的stopwords已下載，請參見http://www.nltk.org/data ：

>>> import nltk
>>> packages = ['stopwords']
>>> downloader.download(packages)
>>>
>>> stop = stopwords.words('english')
>>> sent = 'this is a foobar sentence'.split()
>>> [word for word in sent if word not in stop]
['foobar', 'sentence']

無法使用nltk從單詞序列中消除停用詞

問題描述

1 個解決方案

解決方案1
0 已采納 2014-03-16 05:47:38

無法使用nltk從單詞序列中消除停用詞

問題描述

1 個解決方案

解決方案1 0 已采納 2014-03-16 05:47:38

解決方案1
0 已采納 2014-03-16 05:47:38