简体   繁体   English

如何从文本中删除停用词列表项

[英]How to remove stop-words list items from a text

I try to remove stopwords from a dataframe with below code. 我尝试使用以下代码从数据框中删除停用词。 It does not produce error but it does not remove stopwords from the dataframe. 它不会产生错误,但是不会从数据帧中删除停用词。

def stop_words(df):

    stop_words = set(["a", "acaba", "altı","alti", "ama", "ancak","bir"])

    df['text'] = [word for word in df['text'] if word not in stop_words]
    return df.text

df.text = stop_words(df)

for instance df.text[2] is "gel sen necektigimi bir de bana sor".It does not remove "bir" word. 例如,df.text [2]是“ gel sen necektigimi bir de bana sor”。它不会删除“ bir”一词。 How can I solve this ? 我该如何解决?

df['text'] is a column of strings. df['text']是一列字符串。 Iterate over it and you iterate over each sentence. 遍历它,然后遍历每个句子。 What did you expect? 您期望什么?

You'll need to split each word and iterate over that. 您需要拆分每个单词并对其进行迭代。 You could use a list comprehension. 您可以使用列表推导。 You could also use apply : 您还可以使用apply

def f(x):
    return [w for w in x if w not in stop_words]

df['text'] = df['text'].str.split().apply(f).str.join(' ')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM