使用 nltk 停用詞從 Pandas 列中的列表中刪除停用詞時，刪除停用詞失敗

Question

我有一個帶有字符串條目的數據框，我正在使用一個函數來刪除停用詞。 單元格編譯但它不會產生預期的結果。

df['column'].iloc[0] = 'BK HE HAS KITCHEN TROUBLE WITH HIS BLENDER'

def text_process(text):
    try :
        nopunc = [char for char in text if char not in sting.punctuation]
        nopunc = ' '.join(nopunc)
        return [word for word in nopunc.split() if word.lower not in stopwords.words('english')
    except TypeError: return []

df['column'].apply(text_process)

The first cell results look like this : 
['BK ', 'HE', 'HAS', 'KITCHEN', 'TROUBLE', 'WITH', 'HIS', 'BLENDER']

(He, has, with, his) 應該被刪除但它們仍然出現在單元格中？ 誰能解釋這是如何發生的或如何解決它？

Answer 1

from nltk.corpus import stopwords 
from nltk.tokenize import word_tokenize 

example_sent = "BK HE HAS KITCHEN TROUBLE WITH HIS BLENDER"
example_sent=example_sent.lower()
stop_words = set(stopwords.words('english')) 

word_tokens = word_tokenize(example_sent) 

filtered_sentence = [w for w in word_tokens if not w in stop_words] 

filtered_sentence = [] 

for w in word_tokens: 
    if w not in stop_words: 
       filtered_sentence.append(w) 


print(word_tokens) 
print(filtered_sentence)

['bk', 'he', 'has', 'kitchen', 'trouble', 'with', 'his', 'blender']

['bk', '廚房', '麻煩', '攪拌機']

使用 nltk 停用詞從 Pandas 列中的列表中刪除停用詞時，刪除停用詞失敗

問題描述

1 個解決方案

解決方案1
0 2020-10-20 06:19:26

使用 nltk 停用詞從 Pandas 列中的列表中刪除停用詞時，刪除停用詞失敗

問題描述

1 個解決方案

解決方案1 0 2020-10-20 06:19:26

解決方案1
0 2020-10-20 06:19:26