简体   繁体   English

将稀疏矩阵(csc_matrix)转换为pandas数据帧

[英]Convert sparse matrix (csc_matrix) to pandas dataframe

I want to convert this matrix into a pandas dataframe. 我想将此矩阵转换为pandas数据帧。 csc_matrix csc_matrix

The first number in the bracket should be the index , the second number being columns and the number in the end being the data . 括号中的第一个数字应该是索引第二个数字是 ,最后的数字数据

I want to do this to do feature selection in text analysis, the first number represents the document, the second being the feature of word and the last number being the TFIDF score. 我想这样做在文本分析中进行特征选择,第一个数字代表文档,第二个数字代表单词,最后一个数字代表TFIDF分数。

Getting a dataframe helps me to transform the text analysis problem into data analysis. 获取数据框有助于我将文本分析问题转换为数据分析。

from scipy.sparse import csc_matrix

csc = csc_matrix(np.array(
    [[0, 0, 4, 0, 0, 0],
     [1, 0, 0, 0, 2, 0],
     [2, 0, 0, 1, 0, 0],
     [0, 0, 0, 0, 0, 1],
     [4, 0, 3, 2, 0, 0]]))

# Return a Coordinate (coo) representation of the Compresses-Sparse-Column (csc) matrix.
coo = csc.tocoo(copy=False)

# Access `row`, `col` and `data` properties of coo matrix.
>>> pd.DataFrame({'index': coo.row, 'col': coo.col, 'data': coo.data}
                 )[['index', 'col', 'data']].sort_values(['index', 'col']
                 ).reset_index(drop=True)
   index  col  data
0      0    2     4
1      1    0     1
2      1    4     2
3      2    0     2
4      2    3     1
5      3    5     1
6      4    0     4
7      4    2     3
8      4    3     2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM