简体   繁体   English

我在我的 dataframe 上使用单词 tokenize function,通过编写 word_dict,但在执行错误消息“预期的字符串或类似字节的对象”之后

[英]I use the word tokenize function on my dataframe, by writing word_dict, but after executing the error message 'expected string or bytes-like object'

I want to write the code word_dict , by calling the column name more_clean on the dataframe, but the error expected string or bytes-like object appears.我想通过在 dataframe 上调用列名more_clean来编写代码word_dict ,但是会出现错误预期的字符串或类似 object 的字节。

This is my dataframe:这是我的 dataframe:数据帧图像

And this is my code:这是我的代码:

word_dict = {}
for i in range(0,len(df['more_clean'])):
    sentence = df['more_clean'][i]
    word_token = word_tokenize(sentence)
    for j in word_token:
        if j not in word_dict:
            word_dict[j] = 1
        else:
            word_dict[j] += 1

and an error message appears like this并出现这样的错误消息

TypeError: expected string or bytes-like object TypeError:预期的字符串或类似字节的 object

You need to make sure the sentence variable is of a str type:您需要确保sentence变量是str类型:

word_token = word_tokenize(str(sentence))

See the nltk.tokenize.word_tokenize documentation :请参阅nltk.tokenize.word_tokenize文档

nltk.tokenize.word_tokenize(text, language='english', preserve_line=False)

Parameters参数

  • text (str) – text to split into words text (str) -- 要拆分为单词的文本
  • language (str) – the model name in the Punkt corpus language (str) – Punkt 语料库中的 model 名称
  • preserve_line (bool) – A flag to decide whether to sentence tokenize the text or not. preserve_line (bool) -- 决定是否对文本进行句子标记的标志。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 TypeError: 预期的字符串或类似字节的 object – 使用 Python/NLTK word_tokenize - TypeError: expected string or bytes-like object – with Python/NLTK word_tokenize 如何解决 TypeError: cannot use a string pattern on a bytes-like object - word_tokenize, Counter and spacy - How to resolve TypeError: cannot use a string pattern on a bytes-like object - word_tokenize, Counter and spacy NLP:标记化:TypeError:预期的字符串或类似字节的 object - NLP: Tokenize : TypeError: expected string or bytes-like object Pandas Dataframe中的Python Pandas NLTK令牌化列:预期的字符串或类似字节的对象 - Python Pandas NLTK Tokenize Column in Pandas Dataframe: expected string or bytes-like object 熊猫完整数据框应用于正则表达式函数将引发错误:TypeError:“预期的字符串或类似字节的对象” - Pandas full Dataframe Apply to regex function throws error: TypeError: 'expected string or bytes-like object' 错误:预期的字符串或类似字节的对象 - Error : expected string or bytes-like object 预期的字符串或类似字节的对象错误 - expected string or bytes-like object error 在函数中使用* args并得到以下错误:“预期的字符串或类似字节的对象” - Use *args in function and get this error : “ expected string or bytes-like object ” 替换数据帧列中的多个字符串的函数 - “TypeError: expected string or bytes-like object” - Function to replace multiple strings in dataframe column - “TypeError: expected string or bytes-like object” Pony ORM-解决“预期的字符串或类似字节的对象”错误 - Pony ORM - Resolve 'Expected string or bytes-like object' error
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM