Python: scipy.sparse / pandas Null values in sparse matrix is being converted to large negative integer

Question

我正在嘗試使用 scipy 稀疏 COO 矩陣，但在將 null 值轉換為大負整數時遇到了奇怪的錯誤。 這是我正在做的事情：

import pickle5 as pk5
from scipy import sparse
import pandas as pd

with open('some_file.pickle', 'rb') as f:
    df = pk5.load(f)

原始稀疏 df 看起來是正確的：

df.iloc[0:5, 0:4]) ：

 1028799.3_nuc_coding  1156994.3_nuc_coding  1156995.3_nuc_coding
0                   1.0                   NaN                   NaN
1                   NaN                   1.0                   NaN
2                   NaN                   NaN                   NaN
3                   NaN                   NaN                   NaN
4                   NaN                   NaN                   NaN

運行 dropna 工作正常，所以它實際上是 null 值。

df.iloc[0].dropna().index[:3]

Index(['1028799.3_nuc_coding', '1280.11650_nuc_coding',
       '1280.11655_nuc_coding'],
      dtype='object')

但是對其運行任何操作會將 NaN 值更改為 -9223372036854775808。 例如這里是df.T ：

                                      0                    1  \
1028799.3_nuc_coding                    1 -9223372036854775808   
1156994.3_nuc_coding -9223372036854775808                    1   
1156995.3_nuc_coding -9223372036854775808 -9223372036854775808   

                                        2                    3  \
1028799.3_nuc_coding -9223372036854775808 -9223372036854775808   
1156994.3_nuc_coding -9223372036854775808 -9223372036854775808   
1156995.3_nuc_coding -9223372036854775808 -9223372036854775808   

                                        4  
1028799.3_nuc_coding -9223372036854775808  
1156994.3_nuc_coding -9223372036854775808  
1156995.3_nuc_coding -9223372036854775808

我在 df.iterrows() 和使用上面的代碼覆蓋到 scipy 中的 coo 矩陣時遇到了類似的錯誤。

coo_mat = sparse.coo_matrix(df.values, shape=df.shape)
print(coo_mat)

(0, 0)  1
  (0, 1)    -9223372036854775808
  (0, 2)    -9223372036854775808
  (0, 3)    -9223372036854775808
  (0, 4)    -9223372036854775808
  (0, 5)    -9223372036854775808
  (0, 6)    -9223372036854775808
  (0, 7)    -9223372036854775808
  (0, 8)    -9223372036854775808
  (0, 9)    -9223372036854775808
  (0, 10)   -9223372036854775808
  (0, 11)   -9223372036854775808
  (0, 12)   -9223372036854775808
  (0, 13)   -9223372036854775808
  (0, 14)   -9223372036854775808
  (0, 15)   -9223372036854775808
  (0, 16)   -9223372036854775808
  (0, 17)   -9223372036854775808
  (0, 18)   -9223372036854775808
  (0, 19)   -9223372036854775808
  (0, 20)   -9223372036854775808
  (0, 21)   -9223372036854775808
  (0, 22)   -9223372036854775808
  (0, 23)   -9223372036854775808
  (0, 24)   -9223372036854775808
  : :

Answer 1

感謝@hpaulj 的提示。 問題是我的 dtype 是一個 int。 所以將它重鑄為浮動解決了這個問題：示例：

df.iloc[0:5, 0:4].astype(float).T

                          0   1   2    3  4
1028799.3_nuc_coding    1.0 NaN NaN NaN NaN
1156994.3_nuc_coding    NaN 1.0 NaN NaN NaN
1156995.3_nuc_coding    NaN NaN NaN NaN NaN
1156996.3_nuc_coding    NaN NaN NaN NaN NaN

同樣，一旦類型更改為 float，其他操作（如 iterrows 和強制轉換為 coo_matrix）也會按預期工作。

Python: scipy.sparse / pandas Null values in sparse matrix is being converted to large negative integer

問題描述

1 個解決方案

解決方案1
0 已采納 2021-04-29 22:26:24

Python: scipy.sparse / pandas Null values in sparse matrix is being converted to large negative integer

問題描述

1 個解決方案

解決方案1 0 已采納 2021-04-29 22:26:24

解決方案1
0 已采納 2021-04-29 22:26:24