繁体 English 中英

类型错误：doc2bow 需要输入的 unicode 标记数组，而不是使用 gensim.corpora.Dictionary() 时的单个字符串

[英]TypeError: doc2bow expects an array of unicode tokens on input, not a single string when using gensim.corpora.Dictionary()

原文 2017-06-04 09:22:06 8 2 python/ dictionary/ gensim

有一个像这样的数据框：

  index  terms   
  1345  ['jays', 'place', 'great', 'subway']    
  1543  ['described', 'communicative', 'friendly']    
  9874  ['great', 'sarahs', 'apartament', 'back']    
  2456  ['great', 'sarahs', 'apartament', 'back']

我尝试从评论语料库 ['terms'] 创建字典，但我遇到了错误消息！

from gensim import corpora, models
dictionary = corpora.Dictionary( comments['terms'] )

TypeError: doc2bow expects an array of unicode tokens on input, not a single string

2 个解决方案

每个索引都需要将其术语放在一个子列表中，所有这些都嵌套在更大的列表中。

theterms = [['jays', 'place', 'great', 'subway'],['described', 'communicative', 'friendly'], ['great', 'sarahs', 'apartament', 'back'],['great', 'sarahs', 'apartament', 'back']] 

dictionary = corpora.Dictionary(theterms)

首先使用comments['terms'].tolist()将comments['terms']转换为一个列表，然后运行语料库，它应该可以工作。 在创建字典之前，您可以进行其他预处理，如词干提取或停用词删除等。

Gensim：类型错误：doc2bow 需要输入的 unicode 标记数组，而不是单个字符串

[英]Gensim: TypeError: doc2bow expects an array of unicode tokens on input, not a single string

获取“doc2bow 需要输入 unicode 令牌数组，而不是单个字符串”作为尝试使用 gensim 执行 nlp 有解决方案吗？

[英]Getting “doc2bow expects an array of unicode tokens on input, not a single string” as a try to do nlp using gensim" Is there a solution?

主题建模错误（doc2bow 需要输入 unicode 令牌数组，而不是单个字符串）

[英]topic modeling error (doc2bow expects an array of unicode tokens on input, not a single string)

gensim.corpora.Dictionary 是否保存了词频？

[英]Does gensim.corpora.Dictionary have term frequency saved?

将 dataframe 列中已保存令牌的语料库转换为 gensim 字典时出错

[英]Error while converting corpora of saved tokens in a dataframe column into a gensim dictionary

gensim.corpora 字典类型错误将标记化列解释为单个字符串

[英]gensim.corpora Dictionary type error interprets tokenized column as single string

如何在 Gensim 词典中输入由不同标记组成的系列/列表？

[英]How to input a series/list consisting of different tokens in a Gensim Dictionary?

python gensim TypeError：强制转换为Unicode：需要字符串或缓冲区，找到列表

[英]python gensim TypeError: coercing to Unicode: need string or buffer, list found

如何将标记添加到 gensim 字典

[英]how to add tokens to gensim dictionary

句子从 gensim.corpora 返回空字典

[英]Sentence returns empty dictionary from gensim.corpora

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Gensim：类型错误：doc2bow 需要输入的 unicode 标记数组，而不是单个字符串获取“doc2bow 需要输入 unicode 令牌数组，而不是单个字符串”作为尝试使用 gensim 执行 nlp 有解决方案吗？主题建模错误（doc2bow 需要输入 unicode 令牌数组，而不是单个字符串） gensim.corpora.Dictionary 是否保存了词频？将 dataframe 列中已保存令牌的语料库转换为 gensim 字典时出错 gensim.corpora 字典类型错误将标记化列解释为单个字符串如何在 Gensim 词典中输入由不同标记组成的系列/列表？ python gensim TypeError：强制转换为Unicode：需要字符串或缓冲区，找到列表如何将标记添加到 gensim 字典句子从 gensim.corpora 返回空字典

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM