簡體 English 中英

類型錯誤：doc2bow 需要輸入的 unicode 標記數組，而不是使用 gensim.corpora.Dictionary() 時的單個字符串

[英]TypeError: doc2bow expects an array of unicode tokens on input, not a single string when using gensim.corpora.Dictionary()

原文 2017-06-04 09:22:06 5 2 python/ dictionary/ gensim

有一個像這樣的數據框：

  index  terms   
  1345  ['jays', 'place', 'great', 'subway']    
  1543  ['described', 'communicative', 'friendly']    
  9874  ['great', 'sarahs', 'apartament', 'back']    
  2456  ['great', 'sarahs', 'apartament', 'back']

我嘗試從評論語料庫 ['terms'] 創建字典，但我遇到了錯誤消息！

from gensim import corpora, models
dictionary = corpora.Dictionary( comments['terms'] )

TypeError: doc2bow expects an array of unicode tokens on input, not a single string

2 個解決方案

每個索引都需要將其術語放在一個子列表中，所有這些都嵌套在更大的列表中。

theterms = [['jays', 'place', 'great', 'subway'],['described', 'communicative', 'friendly'], ['great', 'sarahs', 'apartament', 'back'],['great', 'sarahs', 'apartament', 'back']] 

dictionary = corpora.Dictionary(theterms)

首先使用comments['terms'].tolist()將comments['terms']轉換為一個列表，然后運行語料庫，它應該可以工作。 在創建字典之前，您可以進行其他預處理，如詞干提取或停用詞刪除等。

Gensim：類型錯誤：doc2bow 需要輸入的 unicode 標記數組，而不是單個字符串

[英]Gensim: TypeError: doc2bow expects an array of unicode tokens on input, not a single string

獲取“doc2bow 需要輸入 unicode 令牌數組，而不是單個字符串”作為嘗試使用 gensim 執行 nlp 有解決方案嗎？

[英]Getting “doc2bow expects an array of unicode tokens on input, not a single string” as a try to do nlp using gensim" Is there a solution?

主題建模錯誤（doc2bow 需要輸入 unicode 令牌數組，而不是單個字符串）

[英]topic modeling error (doc2bow expects an array of unicode tokens on input, not a single string)

gensim.corpora.Dictionary 是否保存了詞頻？

[英]Does gensim.corpora.Dictionary have term frequency saved?

將 dataframe 列中已保存令牌的語料庫轉換為 gensim 字典時出錯

[英]Error while converting corpora of saved tokens in a dataframe column into a gensim dictionary

gensim.corpora 字典類型錯誤將標記化列解釋為單個字符串

[英]gensim.corpora Dictionary type error interprets tokenized column as single string

如何在 Gensim 詞典中輸入由不同標記組成的系列/列表？

[英]How to input a series/list consisting of different tokens in a Gensim Dictionary?

python gensim TypeError：強制轉換為Unicode：需要字符串或緩沖區，找到列表

[英]python gensim TypeError: coercing to Unicode: need string or buffer, list found

如何將標記添加到 gensim 字典

[英]how to add tokens to gensim dictionary

句子從 gensim.corpora 返回空字典

[英]Sentence returns empty dictionary from gensim.corpora

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 Gensim：類型錯誤：doc2bow 需要輸入的 unicode 標記數組，而不是單個字符串獲取“doc2bow 需要輸入 unicode 令牌數組，而不是單個字符串”作為嘗試使用 gensim 執行 nlp 有解決方案嗎？主題建模錯誤（doc2bow 需要輸入 unicode 令牌數組，而不是單個字符串） gensim.corpora.Dictionary 是否保存了詞頻？將 dataframe 列中已保存令牌的語料庫轉換為 gensim 字典時出錯 gensim.corpora 字典類型錯誤將標記化列解釋為單個字符串如何在 Gensim 詞典中輸入由不同標記組成的系列/列表？ python gensim TypeError：強制轉換為Unicode：需要字符串或緩沖區，找到列表如何將標記添加到 gensim 字典句子從 gensim.corpora 返回空字典

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM