[英]removing stopwords fails when using nltk stopwords to remove them from a list in a pandas column
我有一個帶有字符串條目的數據框,我正在使用一個函數來刪除停用詞。 單元格編譯但它不會產生預期的結果。
df['column'].iloc[0] = 'BK HE HAS KITCHEN TROUBLE WITH HIS BLENDER'
def text_process(text):
try :
nopunc = [char for char in text if char not in sting.punctuation]
nopunc = ' '.join(nopunc)
return [word for word in nopunc.split() if word.lower not in stopwords.words('english')
except TypeError: return []
df['column'].apply(text_process)
The first cell results look like this :
['BK ', 'HE', 'HAS', 'KITCHEN', 'TROUBLE', 'WITH', 'HIS', 'BLENDER']
(He, has, with, his) 應該被刪除但它們仍然出現在單元格中? 誰能解釋這是如何發生的或如何解決它?
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
example_sent = "BK HE HAS KITCHEN TROUBLE WITH HIS BLENDER"
example_sent=example_sent.lower()
stop_words = set(stopwords.words('english'))
word_tokens = word_tokenize(example_sent)
filtered_sentence = [w for w in word_tokens if not w in stop_words]
filtered_sentence = []
for w in word_tokens:
if w not in stop_words:
filtered_sentence.append(w)
print(word_tokens)
print(filtered_sentence)
['bk', 'he', 'has', 'kitchen', 'trouble', 'with', 'his', 'blender']
['bk', '廚房', '麻煩', '攪拌機']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.