import gensim
LDA = gensim.models.ldamodel.LdaModel
dictionnary = corpora.Dictionary(docCleaned) #Error message appears here!!!
doc_term_matrix = [dictionary.doc2bow(doc) for doc in docCleaned]
Error Message ->
TypeError: doc2bow expects an array of unicode tokens on input, not a single string
corpora.Dictionary
requires a list of strings whereas you are providing only a string to the constructor.
You may want to split the string into "documents". It depends on the nature of text you have. In the worst case, when each "document" will be one string - you can split on punctuation:
import string
import re
dictionnary = corpora.Dictionary(re.split('[' + re.escape(string.punctuation) + ']', docCleaned))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.