将 numpy float64 稀疏矩阵转换为 pandas 数据帧

Question

I have an nxn numpy float64 sparse matrix ( data , where n = 44 ), where the rows and columns are graph nodes and the values are edge weights:我有一个nxn numpy float64 sparse matrix （ data ，其中n = 44 ），其中行和列是图形节点，值是边权重：

>>> data
<44x44 sparse matrix of type '<class 'numpy.float64'>'
    with 668 stored elements in Compressed Sparse Row format>

>>> type(data)
<class 'scipy.sparse.csr.csr_matrix'>

>>> print(data)
  (0, 7)    0.11793236293516568
  (0, 9)    0.10992000939300195
  (0, 21)   0.7422196678913772
  (0, 23)   0.0630039712667936
  (0, 24)   0.027037442463504143
  (0, 27)   0.16908845414214152
  (0, 28)   0.6109227233402952
  (0, 32)   0.0514765253537568
  (0, 33)   0.016341754080557713
  (1, 6)    0.015070325434709386
  (1, 10)   9.346673769086203e-05
  (1, 11)   0.2471018034781923
  (1, 14)   0.0020684269551621776
  (1, 18)   0.015258704502643251
  (1, 20)   0.021798149289490358
  (1, 22)   0.0087026831764125
  (1, 24)   0.1454235884185166
  (1, 25)   0.022060777594183015
  (1, 29)   0.9117391202819067
  (1, 30)   0.018557883854566116
  (1, 31)   0.001876070225734826
  (1, 32)   0.025841354399637764
  (1, 33)   0.014766488228364438
  (1, 39)   0.002791226433410351
  (1, 43)   1.0
  : :
  (41, 7)   0.8922099840113696
  (41, 10)  0.015776226631920767
  (41, 12)  1.0
  (41, 15)  0.1839408706622038
  (41, 18)  0.5151025641025642
  (41, 20)  0.4599130036630037
  (41, 22)  0.29378473237788827
  (41, 33)  0.47474890700697153
  (41, 39)  1.0
  (42, 2)   1.0
  (42, 10)  0.023305789342610222
  (42, 11)  0.011349136164776494
  (42, 12)  1.0
  (42, 17)  0.886081346522542
  (42, 18)  1.0
  (42, 30)  1.0
  (42, 40)  1.0
  (43, 1)   1.0
  (43, 6)   1.0
  (43, 11)  0.039948959300013256
  (43, 13)  1.0
  (43, 14)  0.02669811947637717
  (43, 29)  1.0
  (43, 30)  1.0
  (43, 36)  0.3381986531986532

I'd like to convert it to a pandas data frame , in order to write it to a file, with the columns: node1, node2, edge_weight , which will therefore give:我想将其转换为pandas data frame ，以便将其写入文件，其中包含以下列： node1, node2, edge_weight ，因此将给出：

node1, node2, edge_weight
0, 7, 0.11793236293516568
0, 9, 0.10992000939300195
:, :, :
43, 36, 0.3381986531986532

Any idea how to do that?知道怎么做吗？

Note that:注意：

>>> pandas.DataFrame(data)

gives:给出：

                                                    0
0     (0, 7)\t0.11793236293516568\n  (0, 9)\t0.109...
1     (0, 6)\t0.015070325434709386\n  (0, 10)\t9.3...

And和

>>> pandas.DataFrame(print(data))

Gives:给出：

  (0, 7)    0.11793236293516568
  (0, 9)    0.10992000939300195

So I guess pandas.DataFrame(print(data)) is close to what I'm looking for.所以我猜pandas.DataFrame(print(data))接近我正在寻找的东西。

Answer 1

你可以尝试toarray

pd.DataFrame(A.toarray())

Answer 2

This ipython session shows one way you could do it.这个 ipython 会话展示了一种你可以做到的方法。 The two steps are: convert the sparse matrix to COO format, and then create the Pandas DataFrame using the .row , .col and .data attributes of the COO matrix.两个步骤是：将稀疏矩阵转换为COO格式，然后使用COO矩阵的.row 、 .col和.data属性创建Pandas DataFrame。

In [50]: data                                                                                                    
Out[50]: 
<15x15 sparse matrix of type '<class 'numpy.float64'>'
    with 11 stored elements in Compressed Sparse Row format>

In [51]: print(data)                                                                                             
  (1, 12)   0.8581958095588134
  (6, 12)   0.03828052946099181
  (6, 14)   0.7908634838351427
  (7, 1)    0.7995008873930302
  (7, 11)   0.48477191537121145
  (7, 13)   0.6226526443518743
  (9, 4)    0.37242576669669103
  (11, 1)   0.9604278557580955
  (11, 5)   0.13285436036287313
  (12, 11)  0.5631419223609928
  (13, 8)   0.16481624650723847

In [52]: import pandas as pd                                                                                     

In [53]: c = data.tocoo()                                                                                        

In [54]: df = pd.DataFrame({node1: c.row, node2: c.col, edge_weight: c.data})                                   

In [55]: df                                                                                                      
Out[55]: 
    node1  node2  edge_weight
0       1     12     0.858196
1       6     12     0.038281
2       6     14     0.790863
3       7      1     0.799501
4       7     11     0.484772
5       7     13     0.622653
6       9      4     0.372426
7      11      1     0.960428
8      11      5     0.132854
9      12     11     0.563142
10     13      8     0.164816

Answer 3

I ran into a similar problem when using OneHotEncoder I fixed it by changing sparse to False我在使用OneHotEncoder时遇到了类似的问题，我通过将 sparse 更改为 False 来修复它

enc = OneHotEncoder(sparse=False)

将 numpy float64 稀疏矩阵转换为 pandas 数据帧

问题描述

3 个解决方案

解决方案1
8 2019-12-14 23:32:25

解决方案2
2 已采纳 2019-12-14 23:30:50

解决方案3
0 2022-09-17 06:24:29

将 numpy float64 稀疏矩阵转换为 pandas 数据帧

问题描述

3 个解决方案

解决方案1 8 2019-12-14 23:32:25

解决方案2 2 已采纳 2019-12-14 23:30:50

解决方案3 0 2022-09-17 06:24:29

解决方案1
8 2019-12-14 23:32:25

解决方案2
2 已采纳 2019-12-14 23:30:50

解决方案3
0 2022-09-17 06:24:29