df_clean['message'] = df_clean['message'].apply(lambda x: gensim.parsing.preprocessing.remove_stopwords(x))
I tried this on a dataframe's column 'message' but I get the error:
TypeError: decoding to str: need a bytes-like object, list found
Apparently, the df_clean["message"]
column contains a list of words, not a string, hence the error saying that need a bytes-like object, list found
.
To fix this issue, you need to convert it to string again using join()
method like so:
df_clean['message'] = df_clean['message'].apply(lambda x: gensim.parsing.preprocessing.remove_stopwords(" ".join(x)))
Notice that the df_clean["message"]
will contain string objects after applying the previous code.
This is not a gensim
problem, the error is raised by pandas
: there is a value in your column message
that is of type list
instead of string
. Here's a minimal pandas
example:
import pandas as pd
from gensim.parsing.preprocessing import remove_stopwords
df = pd.DataFrame([['one', 'two'], ['three', ['four']]], columns=['A', 'B'])
df.A.apply(remove_stopwords) # works fine
df.B.apply(remove_stopwords)
TypeError: decoding to str: need a bytes-like object, list found
What the error is saying is that remove_stopwords needs string type object and you are passing a list , So before removing stop words check that all the values in column are of string type. See the Docs
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.