简体   繁体   中英

I use the word tokenize function on my dataframe, by writing word_dict, but after executing the error message 'expected string or bytes-like object'

I want to write the code word_dict , by calling the column name more_clean on the dataframe, but the error expected string or bytes-like object appears.

This is my dataframe:数据帧图像

And this is my code:

word_dict = {}
for i in range(0,len(df['more_clean'])):
    sentence = df['more_clean'][i]
    word_token = word_tokenize(sentence)
    for j in word_token:
        if j not in word_dict:
            word_dict[j] = 1
        else:
            word_dict[j] += 1

and an error message appears like this

TypeError: expected string or bytes-like object

You need to make sure the sentence variable is of a str type:

word_token = word_tokenize(str(sentence))

See the nltk.tokenize.word_tokenize documentation :

nltk.tokenize.word_tokenize(text, language='english', preserve_line=False)

Parameters

  • text (str) – text to split into words
  • language (str) – the model name in the Punkt corpus
  • preserve_line (bool) – A flag to decide whether to sentence tokenize the text or not.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM