[英]Set CountVectorizer result to pandas.DataFrame
I need to set pandas.DataFrame with matrix features produced by CountVectorizer. 我需要使用CountVectorizer产生的矩阵功能设置pandas.DataFrame。
count_vect = CountVectorizer()
count_vect.fit(text)
xtrain_count = count_vect.transform(train_x)
SaveTxt = pandas.DataFrame()
SaveTxt['text']=xtrain_count
but in the last line SaveTxt['text']=xtrain_count
I got following errors! 但是在最后一行SaveTxt['text']=xtrain_count
我遇到了以下错误!
raise ValueError('Cannot set a frame with no defined index '
ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series
I was wondering how should I set result matrix of CountVectorizer to dataframe? 我想知道如何将CountVectorizer的结果矩阵设置为dataframe? CountVectorizer result is a csr_matrix with about 20000 rows and 200000 columns and contents are integer (1 to 6) CountVectorizer结果是具有约20000行和200000列的csr_matrix,内容为整数(1到6)
pd.DataFrame(my_csr_matrix.todense())
Here is a proof of concept: 这是一个概念证明:
import random
import lorem
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
m = 10
random.seed(0)
data = [lorem.paragraph() for _ in range(m)]
cv = CountVectorizer()
cv.fit(data)
df = pd.DataFrame(data=cv.transform(data).todense())
print(df.shape)
print(df.head())
Result: 结果:
(10, 27)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
0 1 2 2 3 3 0 2 0 3 1 2 2 2 1 1 5 3 2 1 3 1 0 2 2 1 4 4
1 0 0 4 1 0 0 1 3 0 3 2 0 1 0 1 1 1 5 3 2 0 0 1 0 0 3 1
2 0 2 3 1 1 1 2 0 2 0 1 1 1 1 1 3 2 0 1 2 1 4 3 0 1 2 5
3 3 3 4 7 1 2 4 2 2 0 1 2 1 1 0 0 0 2 1 3 2 2 2 2 0 3 4
4 2 3 1 2 3 4 1 1 4 3 2 4 2 2 3 3 2 0 2 3 2 5 4 3 2 1 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.