簡體   English   中英

刪除停用詞 Python

[英]Remove Stop Words Python

所以我正在閱讀一個 csv 文件並在其中獲取單詞。 我正在嘗試刪除停用詞。 這是我的代碼。

import pandas as pd
from nltk.corpus import stopwords as sw

def loadCsv(fileName):
    df = pd.read_csv(fileName, error_bad_lines=False)
    df.dropna(inplace = True)
    return df

def getWords(dataframe):
    words = []
    for tweet in dataframe['SentimentText'].tolist():
        for word in tweet.split():
            word = word.lower()

        words.append(word)

    return set(words) #Create a set from the words list

def removeStopWords(words):
    for word in words: # iterate over word_list
        if word in sw.words('english'): 
            words.remove(word) # remove word from filtered_word_list if it is a stopword

    return set(words)

df = loadCsv("train.csv")
words = getWords(df)
words = removeStopWords(words)

在這條線上

if word in sw.words('english'):

我收到以下錯誤。

例外:沒有描述

更進一步,我將嘗試刪除標點符號,任何指向它的指針也會很棒。 任何幫助深表感謝。

編輯

def removeStopWords(words):
    filtered_word_list = words #make a copy of the words
    for word in words: # iterate over words
        if word in sw.words('english'): 
            filtered_word_list.remove(word) # remove word from filtered_word_list if it is a stopword

    return set(filtered_word_list)

將 removeStopWords 函數更改為以下內容:

def getFilteredStopWords(words):
    list_stopWords=list(set(sw.words('english')))
    filtered_words=[w for w in words if not w in list_stopWords# remove word from filtered_words if it is a stopword
    return filtered_words

這是問題的簡化版本,沒有 Panda。 我相信原始代碼的問題在於在迭代時修改設置的words 通過使用條件列表理解,我們可以測試每個單詞,創建一個新列表,並最終按照原始代碼將其轉換為一個集合。

from nltk.corpus import stopwords as sw

def removeStopWords(words):
    return set([w for w in words if not w in sw.words('english')])

sentence = 'this is a very common english sentence with a finite set of words from my imagination'
words = set(sentence.split())
print(removeStopWords(words))
def remmove_stopwords(sentence):
    list_stop_words = set(stopwords.words('english'))
    words = sentence.split(' ')
    filtered_words = [w for w in words if w not in list_stop_words]
    sentence_list = ' '.join(w for w in filtered_words)
    return sentence_list

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM