[英]How to remove English and Spanish stop words
I am trying to delete stop words for english and spanish.我正在尝试删除英语和西班牙语的停用词。 My code is working for English but not Spanish:
我的代码适用于英语,但不适用于西班牙语:
stopword = nltk.corpus.stopwords.words('english', 'spanish')
def remove_stopwords(text):
text = [word for word in text if word not in stopword]
return text
df['Tweet_nonstop'] = df['Tweet_tokenized'].apply(lambda x: remove_stopwords(x))
Can someone help with this problem?有人可以帮助解决这个问题吗? Thank you
谢谢
To get English and Spanish stopwords, you can use this:要获取英语和西班牙语停用词,您可以使用以下命令:
stopword_en = nltk.corpus.stopwords.words('english')
stopword_es = nltk.corpus.stopwords.words('spanish')
stopword = stopword_en + stopword_es
The second argument to nltk.corpus.stopwords.words
, from the help, isn't another language: nltk.corpus.stopwords.words
的第二个参数,来自帮助,不是另一种语言:
>>> help(nltk.corpus.stopwords.words)
Help on method words in module nltk.corpus.reader.wordlist:
words(fileids=None, ignore_lines_startswith='\n') method of nltk.corpus.reader.wordlist.WordListCorpusReader instance
The first argument, fileids
, can take multiple values, so, a call such as nltk.corpus.stopwords.words(fileids=('english', 'spanish'))
also works as intended.第一个参数
fileids
可以采用多个值,因此,诸如nltk.corpus.stopwords.words(fileids=('english', 'spanish'))
类的调用也可以按预期工作。
in addittion to the answer above, try除了上面的答案,尝试
stopwords.words(['english','spanish'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.