简体   繁体   中英

How to remove stop-words list items from a text

I try to remove stopwords from a dataframe with below code. It does not produce error but it does not remove stopwords from the dataframe.

def stop_words(df):

    stop_words = set(["a", "acaba", "altı","alti", "ama", "ancak","bir"])

    df['text'] = [word for word in df['text'] if word not in stop_words]
    return df.text

df.text = stop_words(df)

for instance df.text[2] is "gel sen necektigimi bir de bana sor".It does not remove "bir" word. How can I solve this ?

df['text'] is a column of strings. Iterate over it and you iterate over each sentence. What did you expect?

You'll need to split each word and iterate over that. You could use a list comprehension. You could also use apply :

def f(x):
    return [w for w in x if w not in stop_words]

df['text'] = df['text'].str.split().apply(f).str.join(' ')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM