[英]Pandas : ValueError ( any way to convert Sparse[float64, 0.0] dtypes to float64 datatype )
[英]Convert a numpy float64 sparse matrix to a pandas data frame
我有一個nxn
numpy
float64
sparse matrix
( data
,其中n = 44
),其中行和列是圖形節點,值是邊權重:
>>> data
<44x44 sparse matrix of type '<class 'numpy.float64'>'
with 668 stored elements in Compressed Sparse Row format>
>>> type(data)
<class 'scipy.sparse.csr.csr_matrix'>
>>> print(data)
(0, 7) 0.11793236293516568
(0, 9) 0.10992000939300195
(0, 21) 0.7422196678913772
(0, 23) 0.0630039712667936
(0, 24) 0.027037442463504143
(0, 27) 0.16908845414214152
(0, 28) 0.6109227233402952
(0, 32) 0.0514765253537568
(0, 33) 0.016341754080557713
(1, 6) 0.015070325434709386
(1, 10) 9.346673769086203e-05
(1, 11) 0.2471018034781923
(1, 14) 0.0020684269551621776
(1, 18) 0.015258704502643251
(1, 20) 0.021798149289490358
(1, 22) 0.0087026831764125
(1, 24) 0.1454235884185166
(1, 25) 0.022060777594183015
(1, 29) 0.9117391202819067
(1, 30) 0.018557883854566116
(1, 31) 0.001876070225734826
(1, 32) 0.025841354399637764
(1, 33) 0.014766488228364438
(1, 39) 0.002791226433410351
(1, 43) 1.0
: :
(41, 7) 0.8922099840113696
(41, 10) 0.015776226631920767
(41, 12) 1.0
(41, 15) 0.1839408706622038
(41, 18) 0.5151025641025642
(41, 20) 0.4599130036630037
(41, 22) 0.29378473237788827
(41, 33) 0.47474890700697153
(41, 39) 1.0
(42, 2) 1.0
(42, 10) 0.023305789342610222
(42, 11) 0.011349136164776494
(42, 12) 1.0
(42, 17) 0.886081346522542
(42, 18) 1.0
(42, 30) 1.0
(42, 40) 1.0
(43, 1) 1.0
(43, 6) 1.0
(43, 11) 0.039948959300013256
(43, 13) 1.0
(43, 14) 0.02669811947637717
(43, 29) 1.0
(43, 30) 1.0
(43, 36) 0.3381986531986532
我想將其轉換為pandas
data frame
,以便將其寫入文件,其中包含以下列: node1, node2, edge_weight
,因此將給出:
node1, node2, edge_weight
0, 7, 0.11793236293516568
0, 9, 0.10992000939300195
:, :, :
43, 36, 0.3381986531986532
知道怎么做嗎?
注意:
>>> pandas.DataFrame(data)
給出:
0
0 (0, 7)\t0.11793236293516568\n (0, 9)\t0.109...
1 (0, 6)\t0.015070325434709386\n (0, 10)\t9.3...
和
>>> pandas.DataFrame(print(data))
給出:
(0, 7) 0.11793236293516568
(0, 9) 0.10992000939300195
所以我猜pandas.DataFrame(print(data))
接近我正在尋找的東西。
你可以嘗試toarray
pd.DataFrame(A.toarray())
這個 ipython 會話展示了一種你可以做到的方法。 兩個步驟是:將稀疏矩陣轉換為COO格式,然后使用COO矩陣的.row
、 .col
和.data
屬性創建Pandas DataFrame。
In [50]: data
Out[50]:
<15x15 sparse matrix of type '<class 'numpy.float64'>'
with 11 stored elements in Compressed Sparse Row format>
In [51]: print(data)
(1, 12) 0.8581958095588134
(6, 12) 0.03828052946099181
(6, 14) 0.7908634838351427
(7, 1) 0.7995008873930302
(7, 11) 0.48477191537121145
(7, 13) 0.6226526443518743
(9, 4) 0.37242576669669103
(11, 1) 0.9604278557580955
(11, 5) 0.13285436036287313
(12, 11) 0.5631419223609928
(13, 8) 0.16481624650723847
In [52]: import pandas as pd
In [53]: c = data.tocoo()
In [54]: df = pd.DataFrame({node1: c.row, node2: c.col, edge_weight: c.data})
In [55]: df
Out[55]:
node1 node2 edge_weight
0 1 12 0.858196
1 6 12 0.038281
2 6 14 0.790863
3 7 1 0.799501
4 7 11 0.484772
5 7 13 0.622653
6 9 4 0.372426
7 11 1 0.960428
8 11 5 0.132854
9 12 11 0.563142
10 13 8 0.164816
我在使用OneHotEncoder
時遇到了類似的問題,我通過將 sparse 更改為 False 來修復它
enc = OneHotEncoder(sparse=False)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.