簡體 English 中英

獲取“doc2bow 需要輸入 unicode 令牌數組，而不是單個字符串”作為嘗試使用 gensim 執行 nlp 有解決方案嗎？

[英]Getting “doc2bow expects an array of unicode tokens on input, not a single string” as a try to do nlp using gensim" Is there a solution?

原文 2021-01-23 23:33:46 6 1 python/ dictionary/ nlp/ gensim/ lda

import gensim  
LDA = gensim.models.ldamodel.LdaModel 
dictionnary = corpora.Dictionary(docCleaned) #Error message appears here!!!
doc_term_matrix = [dictionary.doc2bow(doc) for doc in docCleaned]

錯誤信息->

TypeError: doc2bow 在輸入時需要一個 unicode 令牌數組，而不是單個字符串

1 個解決方案

corpora.Dictionary需要一個字符串列表，而您只向構造函數提供一個字符串。

您可能希望將字符串拆分為“文檔”。 這取決於您擁有的文本的性質。 在最壞的情況下，當每個“文檔”都是一個字符串時 - 您可以按標點符號拆分：

import string
import re
dictionnary = corpora.Dictionary(re.split('[' + re.escape(string.punctuation) + ']', docCleaned))

Gensim：類型錯誤：doc2bow 需要輸入的 unicode 標記數組，而不是單個字符串

[英]Gensim: TypeError: doc2bow expects an array of unicode tokens on input, not a single string

類型錯誤：doc2bow 需要輸入的 unicode 標記數組，而不是使用 gensim.corpora.Dictionary() 時的單個字符串

[英]TypeError: doc2bow expects an array of unicode tokens on input, not a single string when using gensim.corpora.Dictionary()

主題建模錯誤（doc2bow 需要輸入 unicode 令牌數組，而不是單個字符串）

[英]topic modeling error (doc2bow expects an array of unicode tokens on input, not a single string)

NLP 使用替換令牌

[英]NLP using replacement tokens

獲取unicode輸入，需要將其作為字符串

[英]Getting a unicode input, need it to be a string

Gensim Doc2Vec從Concatenated模型獲取doc標簽

[英]Gensim Doc2Vec getting the doc tags from the Concatenated model

使用CSV的gensim中的doc2vec

[英]Doc2vec in gensim using csv

Python中如何使用gensim進行字符串語義匹配？

[英]How to do string semantic matching using gensim in Python?

輸入一個字符串並使用 NLP Python 將每個單詞與給定的單詞進行比較

[英]input a string and compare each word with a given word using NLP Python

如何在 Gensim 詞典中輸入由不同標記組成的系列/列表？

[英]How to input a series/list consisting of different tokens in a Gensim Dictionary?

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 Gensim：類型錯誤：doc2bow 需要輸入的 unicode 標記數組，而不是單個字符串類型錯誤：doc2bow 需要輸入的 unicode 標記數組，而不是使用 gensim.corpora.Dictionary() 時的單個字符串主題建模錯誤（doc2bow 需要輸入 unicode 令牌數組，而不是單個字符串） NLP 使用替換令牌獲取unicode輸入，需要將其作為字符串 Gensim Doc2Vec從Concatenated模型獲取doc標簽使用CSV的gensim中的doc2vec Python中如何使用gensim進行字符串語義匹配？輸入一個字符串並使用 NLP Python 將每個單詞與給定的單詞進行比較如何在 Gensim 詞典中輸入由不同標記組成的系列/列表？

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM