Python: scipy.sparse / pandas Null values in sparse matrix is being converted to large negative integer

Question

我正在尝试使用 scipy 稀疏 COO 矩阵，但在将 null 值转换为大负整数时遇到了奇怪的错误。 这是我正在做的事情：

import pickle5 as pk5
from scipy import sparse
import pandas as pd

with open('some_file.pickle', 'rb') as f:
    df = pk5.load(f)

原始稀疏 df 看起来是正确的：

df.iloc[0:5, 0:4]) ：

 1028799.3_nuc_coding  1156994.3_nuc_coding  1156995.3_nuc_coding
0                   1.0                   NaN                   NaN
1                   NaN                   1.0                   NaN
2                   NaN                   NaN                   NaN
3                   NaN                   NaN                   NaN
4                   NaN                   NaN                   NaN

运行 dropna 工作正常，所以它实际上是 null 值。

df.iloc[0].dropna().index[:3]

Index(['1028799.3_nuc_coding', '1280.11650_nuc_coding',
       '1280.11655_nuc_coding'],
      dtype='object')

但是对其运行任何操作会将 NaN 值更改为 -9223372036854775808。 例如这里是df.T ：

                                      0                    1  \
1028799.3_nuc_coding                    1 -9223372036854775808   
1156994.3_nuc_coding -9223372036854775808                    1   
1156995.3_nuc_coding -9223372036854775808 -9223372036854775808   

                                        2                    3  \
1028799.3_nuc_coding -9223372036854775808 -9223372036854775808   
1156994.3_nuc_coding -9223372036854775808 -9223372036854775808   
1156995.3_nuc_coding -9223372036854775808 -9223372036854775808   

                                        4  
1028799.3_nuc_coding -9223372036854775808  
1156994.3_nuc_coding -9223372036854775808  
1156995.3_nuc_coding -9223372036854775808

我在 df.iterrows() 和使用上面的代码覆盖到 scipy 中的 coo 矩阵时遇到了类似的错误。

coo_mat = sparse.coo_matrix(df.values, shape=df.shape)
print(coo_mat)

(0, 0)  1
  (0, 1)    -9223372036854775808
  (0, 2)    -9223372036854775808
  (0, 3)    -9223372036854775808
  (0, 4)    -9223372036854775808
  (0, 5)    -9223372036854775808
  (0, 6)    -9223372036854775808
  (0, 7)    -9223372036854775808
  (0, 8)    -9223372036854775808
  (0, 9)    -9223372036854775808
  (0, 10)   -9223372036854775808
  (0, 11)   -9223372036854775808
  (0, 12)   -9223372036854775808
  (0, 13)   -9223372036854775808
  (0, 14)   -9223372036854775808
  (0, 15)   -9223372036854775808
  (0, 16)   -9223372036854775808
  (0, 17)   -9223372036854775808
  (0, 18)   -9223372036854775808
  (0, 19)   -9223372036854775808
  (0, 20)   -9223372036854775808
  (0, 21)   -9223372036854775808
  (0, 22)   -9223372036854775808
  (0, 23)   -9223372036854775808
  (0, 24)   -9223372036854775808
  : :

Answer 1

感谢@hpaulj 的提示。 问题是我的 dtype 是一个 int。 所以将它重铸为浮动解决了这个问题：示例：

df.iloc[0:5, 0:4].astype(float).T

                          0   1   2    3  4
1028799.3_nuc_coding    1.0 NaN NaN NaN NaN
1156994.3_nuc_coding    NaN 1.0 NaN NaN NaN
1156995.3_nuc_coding    NaN NaN NaN NaN NaN
1156996.3_nuc_coding    NaN NaN NaN NaN NaN

同样，一旦类型更改为 float，其他操作（如 iterrows 和强制转换为 coo_matrix）也会按预期工作。

Python: scipy.sparse / pandas Null values in sparse matrix is being converted to large negative integer

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-04-29 22:26:24

Python: scipy.sparse / pandas Null values in sparse matrix is being converted to large negative integer

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-04-29 22:26:24

解决方案1
0 已采纳 2021-04-29 22:26:24