简体   繁体   English

将 numpy float64 稀疏矩阵转换为 pandas 数据帧

[英]Convert a numpy float64 sparse matrix to a pandas data frame

I have an nxn numpy float64 sparse matrix ( data , where n = 44 ), where the rows and columns are graph nodes and the values are edge weights:我有一个nxn numpy float64 sparse matrixdata ,其中n = 44 ),其中行和列是图形节点,值是边权重:

>>> data
<44x44 sparse matrix of type '<class 'numpy.float64'>'
    with 668 stored elements in Compressed Sparse Row format>

>>> type(data)
<class 'scipy.sparse.csr.csr_matrix'>

>>> print(data)
  (0, 7)    0.11793236293516568
  (0, 9)    0.10992000939300195
  (0, 21)   0.7422196678913772
  (0, 23)   0.0630039712667936
  (0, 24)   0.027037442463504143
  (0, 27)   0.16908845414214152
  (0, 28)   0.6109227233402952
  (0, 32)   0.0514765253537568
  (0, 33)   0.016341754080557713
  (1, 6)    0.015070325434709386
  (1, 10)   9.346673769086203e-05
  (1, 11)   0.2471018034781923
  (1, 14)   0.0020684269551621776
  (1, 18)   0.015258704502643251
  (1, 20)   0.021798149289490358
  (1, 22)   0.0087026831764125
  (1, 24)   0.1454235884185166
  (1, 25)   0.022060777594183015
  (1, 29)   0.9117391202819067
  (1, 30)   0.018557883854566116
  (1, 31)   0.001876070225734826
  (1, 32)   0.025841354399637764
  (1, 33)   0.014766488228364438
  (1, 39)   0.002791226433410351
  (1, 43)   1.0
  : :
  (41, 7)   0.8922099840113696
  (41, 10)  0.015776226631920767
  (41, 12)  1.0
  (41, 15)  0.1839408706622038
  (41, 18)  0.5151025641025642
  (41, 20)  0.4599130036630037
  (41, 22)  0.29378473237788827
  (41, 33)  0.47474890700697153
  (41, 39)  1.0
  (42, 2)   1.0
  (42, 10)  0.023305789342610222
  (42, 11)  0.011349136164776494
  (42, 12)  1.0
  (42, 17)  0.886081346522542
  (42, 18)  1.0
  (42, 30)  1.0
  (42, 40)  1.0
  (43, 1)   1.0
  (43, 6)   1.0
  (43, 11)  0.039948959300013256
  (43, 13)  1.0
  (43, 14)  0.02669811947637717
  (43, 29)  1.0
  (43, 30)  1.0
  (43, 36)  0.3381986531986532

I'd like to convert it to a pandas data frame , in order to write it to a file, with the columns: node1, node2, edge_weight , which will therefore give:我想将其转换为pandas data frame ,以便将其写入文件,其中包含以下列: node1, node2, edge_weight ,因此将给出:

node1, node2, edge_weight
0, 7, 0.11793236293516568
0, 9, 0.10992000939300195
:, :, :
43, 36, 0.3381986531986532

Any idea how to do that?知道怎么做吗?

Note that:注意:

>>> pandas.DataFrame(data)

gives:给出:

                                                    0
0     (0, 7)\t0.11793236293516568\n  (0, 9)\t0.109...
1     (0, 6)\t0.015070325434709386\n  (0, 10)\t9.3...

And

>>> pandas.DataFrame(print(data))

Gives:给出:

  (0, 7)    0.11793236293516568
  (0, 9)    0.10992000939300195

So I guess pandas.DataFrame(print(data)) is close to what I'm looking for.所以我猜pandas.DataFrame(print(data))接近我正在寻找的东西。

你可以尝试toarray

pd.DataFrame(A.toarray())

This ipython session shows one way you could do it.这个 ipython 会话展示了一种你可以做到的方法。 The two steps are: convert the sparse matrix to COO format, and then create the Pandas DataFrame using the .row , .col and .data attributes of the COO matrix.两个步骤是:将稀疏矩阵转换为COO格式,然后使用COO矩阵的.row.col.data属性创建Pandas DataFrame。

In [50]: data                                                                                                    
Out[50]: 
<15x15 sparse matrix of type '<class 'numpy.float64'>'
    with 11 stored elements in Compressed Sparse Row format>

In [51]: print(data)                                                                                             
  (1, 12)   0.8581958095588134
  (6, 12)   0.03828052946099181
  (6, 14)   0.7908634838351427
  (7, 1)    0.7995008873930302
  (7, 11)   0.48477191537121145
  (7, 13)   0.6226526443518743
  (9, 4)    0.37242576669669103
  (11, 1)   0.9604278557580955
  (11, 5)   0.13285436036287313
  (12, 11)  0.5631419223609928
  (13, 8)   0.16481624650723847

In [52]: import pandas as pd                                                                                     

In [53]: c = data.tocoo()                                                                                        

In [54]: df = pd.DataFrame({node1: c.row, node2: c.col, edge_weight: c.data})                                   

In [55]: df                                                                                                      
Out[55]: 
    node1  node2  edge_weight
0       1     12     0.858196
1       6     12     0.038281
2       6     14     0.790863
3       7      1     0.799501
4       7     11     0.484772
5       7     13     0.622653
6       9      4     0.372426
7      11      1     0.960428
8      11      5     0.132854
9      12     11     0.563142
10     13      8     0.164816

I ran into a similar problem when using OneHotEncoder I fixed it by changing sparse to False我在使用OneHotEncoder时遇到了类似的问题,我通过将 sparse 更改为 False 来修复它

enc = OneHotEncoder(sparse=False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM