簡體   English   中英

從NLTK停用詞列表中添加和刪除詞

[英]Add and remove words from the NLTK stopwords list

我正在嘗試從NLTK停用詞列表中添加和刪除單詞:

from nltk.corpus import stopwords

stop_words = set(stopwords.words('french'))

#add words that aren't in the NLTK stopwords list
new_stopwords = ['cette', 'les', 'cet']
new_stopwords_list = set(stop_words.extend(new_stopwords))

#remove words that are in NLTK stopwords list
not_stopwords = {'n', 'pas', 'ne'} 
final_stop_words = set([word for word in new_stopwords_list if word not in not_stopwords])

print(final_stop_words)

輸出:

Traceback (most recent call last):
  File "test_stop.py", line 10, in <module>
new_stopwords_list = set(stop_words.extend(new_stopwords))
AttributeError: 'set' object has no attribute 'extend'

嘗試這個:

from nltk.corpus import stopwords

stop_words = set(stopwords.words('french'))

#add words that aren't in the NLTK stopwords list
new_stopwords = ['cette', 'les', 'cet']
new_stopwords_list = stop_words.union(new_stopwords)

#remove words that are in NLTK stopwords list
not_stopwords = {'n', 'pas', 'ne'} 
final_stop_words = set([word for word in new_stopwords_list if word not in not_stopwords])

print(final_stop_words)

請執行list(set(...))set(...)因為只有列表具有稱為extend的方法:

...
stop_words = list(set(stopwords.words('french')))
...

您可以使用update來代替extend並以這種方式替換這一行new_stopwords_list = set(stop_words.extend(new_stopwords))

stop_words.update(new_stopwords)
new_stopwords_list = set(stop_words)

順便說一句,如果您使用名稱包含單詞list的名稱來呼叫set ,可能會造成混淆

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM