如何删除英语和西班牙语停用词

Question

I am trying to delete stop words for english and spanish.我正在尝试删除英语和西班牙语的停用词。 My code is working for English but not Spanish:我的代码适用于英语，但不适用于西班牙语：

stopword = nltk.corpus.stopwords.words('english', 'spanish')

def remove_stopwords(text):
    text = [word for word in text if word not in stopword]
    return text
    
df['Tweet_nonstop'] = df['Tweet_tokenized'].apply(lambda x: remove_stopwords(x))

Can someone help with this problem?有人可以帮助解决这个问题吗？ Thank you谢谢

Answer 1

To get English and Spanish stopwords, you can use this:要获取英语和西班牙语停用词，您可以使用以下命令：

stopword_en = nltk.corpus.stopwords.words('english')
stopword_es = nltk.corpus.stopwords.words('spanish')
stopword = stopword_en + stopword_es

The second argument to nltk.corpus.stopwords.words , from the help, isn't another language: nltk.corpus.stopwords.words的第二个参数，来自帮助，不是另一种语言：

>>> help(nltk.corpus.stopwords.words)
Help on method words in module nltk.corpus.reader.wordlist:

words(fileids=None, ignore_lines_startswith='\n') method of nltk.corpus.reader.wordlist.WordListCorpusReader instance

The first argument, fileids , can take multiple values, so, a call such as nltk.corpus.stopwords.words(fileids=('english', 'spanish')) also works as intended.第一个参数fileids可以采用多个值，因此，诸如nltk.corpus.stopwords.words(fileids=('english', 'spanish'))类的调用也可以按预期工作。

Answer 2

in addittion to the answer above, try除了上面的答案，尝试

stopwords.words(['english','spanish'])

如何删除英语和西班牙语停用词

问题描述

2 个解决方案

解决方案1
1 2021-01-03 17:20:55

解决方案2
0 2021-01-03 17:22:02

如何删除英语和西班牙语停用词

问题描述

2 个解决方案

解决方案1 1 2021-01-03 17:20:55

解决方案2 0 2021-01-03 17:22:02

解决方案1
1 2021-01-03 17:20:55

解决方案2
0 2021-01-03 17:22:02