从 textblob 中删除停用词

Question

I'm processing a textblob and one of the steps is stopwords removal.我正在处理一个 textblob，其中一个步骤是删除停用词。 Textblobs are immutable, so I'm turning one into a list to do the job: Textblob 是不可变的，所以我将一个变成一个列表来完成这项工作：

blob = tb(tekst)
lista = [word for word in blob.words if word not in stopwords.words('english')]
tekst = ' '.join(lista)
blob = tb(tekst)

Is there a simpler / more elegant solution for the problem?这个问题有更简单/更优雅的解决方案吗？

Answer 1

You can try this code:你可以试试这段代码：

from textblob import TextBlob
from nltk.corpus import stopwords

b="Do not purchase these earphones. It will automatically disconnect and reconnect. Worst product to buy."
text=TextBlob(b)

# Tokens
tokens=set(text.words)
print("Tokens: ",tokens)
# stopwords
stop=set(stopwords.words("english"))

# Removing stop words using set difference operation
print("Filtered Tokens: ",tokens-stop)

Output: * Tokens: {'buy', 'disconnect', 'will', 'to', 'purchase', 'reconnect', 'product', 'It', 'Do', 'and', 'Worst', 'earphones', 'not', 'automatically', 'these'} Output： *令牌： {'购买'，'断开'，'将'，'到'，'购买'，'重新连接'，'产品'，'它'，'做'，'和'，'最差'， '耳机'，'不'，'自动'，'这些'}

Filtered Tokens: {'buy', 'disconnect', 'purchase', 'reconnect', 'product', 'It', 'Do', 'Worst', 'earphones', 'automatically'}*过滤标记： {'buy', 'disconnect', 'purchase', 'reconnect', 'product', 'It', 'Do', 'Worst', 'earphones', 'automatically'}*

从 textblob 中删除停用词

问题描述

1 个解决方案

解决方案1
0 2021-09-29 04:28:23

从 textblob 中删除停用词

问题描述

1 个解决方案

解决方案1 0 2021-09-29 04:28:23

解决方案1
0 2021-09-29 04:28:23