Python: scipy.sparse / pandas Null values in sparse matrix is being converted to large negative integer

Question

I am trying to work with scipy sparse COO matrix but I am running into weird errors with null values being converted to large negative integers.我正在尝试使用 scipy 稀疏 COO 矩阵，但在将 null 值转换为大负整数时遇到了奇怪的错误。 Here is what I am doing:这是我正在做的事情：

import pickle5 as pk5
from scipy import sparse
import pandas as pd

with open('some_file.pickle', 'rb') as f:
    df = pk5.load(f)

The original sparse df looks correct:原始稀疏 df 看起来是正确的：

df.iloc[0:5, 0:4]) : df.iloc[0:5, 0:4]) ：

 1028799.3_nuc_coding  1156994.3_nuc_coding  1156995.3_nuc_coding
0                   1.0                   NaN                   NaN
1                   NaN                   1.0                   NaN
2                   NaN                   NaN                   NaN
3                   NaN                   NaN                   NaN
4                   NaN                   NaN                   NaN

Running dropna works fine so it is actually null values.运行 dropna 工作正常，所以它实际上是 null 值。

df.iloc[0].dropna().index[:3]

Index(['1028799.3_nuc_coding', '1280.11650_nuc_coding',
       '1280.11655_nuc_coding'],
      dtype='object')

But running any operation on it changes the NaN values to -9223372036854775808.但是对其运行任何操作会将 NaN 值更改为 -9223372036854775808。 For example here is df.T :例如这里是df.T ：

                                      0                    1  \
1028799.3_nuc_coding                    1 -9223372036854775808   
1156994.3_nuc_coding -9223372036854775808                    1   
1156995.3_nuc_coding -9223372036854775808 -9223372036854775808   

                                        2                    3  \
1028799.3_nuc_coding -9223372036854775808 -9223372036854775808   
1156994.3_nuc_coding -9223372036854775808 -9223372036854775808   
1156995.3_nuc_coding -9223372036854775808 -9223372036854775808   

                                        4  
1028799.3_nuc_coding -9223372036854775808  
1156994.3_nuc_coding -9223372036854775808  
1156995.3_nuc_coding -9223372036854775808

I have gotten similar errors with df.iterrows() and with coversion to coo matrix in scipy using the code above.我在 df.iterrows() 和使用上面的代码覆盖到 scipy 中的 coo 矩阵时遇到了类似的错误。

coo_mat = sparse.coo_matrix(df.values, shape=df.shape)
print(coo_mat)

(0, 0)  1
  (0, 1)    -9223372036854775808
  (0, 2)    -9223372036854775808
  (0, 3)    -9223372036854775808
  (0, 4)    -9223372036854775808
  (0, 5)    -9223372036854775808
  (0, 6)    -9223372036854775808
  (0, 7)    -9223372036854775808
  (0, 8)    -9223372036854775808
  (0, 9)    -9223372036854775808
  (0, 10)   -9223372036854775808
  (0, 11)   -9223372036854775808
  (0, 12)   -9223372036854775808
  (0, 13)   -9223372036854775808
  (0, 14)   -9223372036854775808
  (0, 15)   -9223372036854775808
  (0, 16)   -9223372036854775808
  (0, 17)   -9223372036854775808
  (0, 18)   -9223372036854775808
  (0, 19)   -9223372036854775808
  (0, 20)   -9223372036854775808
  (0, 21)   -9223372036854775808
  (0, 22)   -9223372036854775808
  (0, 23)   -9223372036854775808
  (0, 24)   -9223372036854775808
  : :

Answer 1

Thanks to @hpaulj for the hint.感谢@hpaulj 的提示。 The problem was that my dtype was an int.问题是我的 dtype 是一个 int。 So recasting it to float solves the issue: Example:所以将它重铸为浮动解决了这个问题：示例：

df.iloc[0:5, 0:4].astype(float).T

                          0   1   2    3  4
1028799.3_nuc_coding    1.0 NaN NaN NaN NaN
1156994.3_nuc_coding    NaN 1.0 NaN NaN NaN
1156995.3_nuc_coding    NaN NaN NaN NaN NaN
1156996.3_nuc_coding    NaN NaN NaN NaN NaN

Similarly, other operations like iterrows and casting to coo_matrix also works as expected once the type is changed to float.同样，一旦类型更改为 float，其他操作（如 iterrows 和强制转换为 coo_matrix）也会按预期工作。

Python: scipy.sparse / pandas Null values in sparse matrix is being converted to large negative integer

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-04-29 22:26:24

Python: scipy.sparse / pandas Null values in sparse matrix is being converted to large negative integer

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-04-29 22:26:24

解决方案1
0 已采纳 2021-04-29 22:26:24