I use the word tokenize function on my dataframe, by writing word_dict, but after executing the error message 'expected string or bytes-like object'

Question

I want to write the code word_dict , by calling the column name more_clean on the dataframe, but the error expected string or bytes-like object appears.

This is my dataframe: 数据帧图像

And this is my code:

word_dict = {}
for i in range(0,len(df['more_clean'])):
    sentence = df['more_clean'][i]
    word_token = word_tokenize(sentence)
    for j in word_token:
        if j not in word_dict:
            word_dict[j] = 1
        else:
            word_dict[j] += 1

and an error message appears like this

TypeError: expected string or bytes-like object

Answer 1

You need to make sure the sentence variable is of a str type:

word_token = word_tokenize(str(sentence))

See the nltk.tokenize.word_tokenize documentation :

nltk.tokenize.word_tokenize(text, language='english', preserve_line=False)

Parameters

text (str) – text to split into words

language (str) – the model name in the Punkt corpus

preserve_line (bool) – A flag to decide whether to sentence tokenize the text or not.

I use the word tokenize function on my dataframe, by writing word_dict, but after executing the error message 'expected string or bytes-like object'

Question

1 answers

solution1
0 2022-08-31 11:48:21

I use the word tokenize function on my dataframe, by writing word_dict, but after executing the error message 'expected string or bytes-like object'

Question

1 answers

solution1 0 2022-08-31 11:48:21

solution1
0 2022-08-31 11:48:21