简体   繁体   English

解决 Pandas 中的 Windows-numpy astype(int) 错误

[英]Working Around the Windows-numpy astype(int) Bug in Pandas

I have a codebase I've been developing on a Mac (and running on Linux machines) based largely on pandas (and therefore numpy).我有一个代码库,我一直在 Mac 上开发(并在 Linux 机器上运行),主要基于 pandas(因此是 numpy)。 Very commonly I type-cast with astype(int) .我通常使用astype(int)进行类型转换。

Recently a Windows-based developer joined our team.最近,一位基于 Windows 的开发人员加入了我们的团队。 In an effort to make the code base more platform-independent, we're trying to gracefully tackle this tricky issue whereby numpy uses a 32-bit type instead of the 64-bit type, which breaks longer integers.为了使代码库更加独立于平台,我们试图优雅地解决这个棘手的问题,即 numpy 使用 32 位类型而不是 64 位类型,这会破坏更长的整数。

On a Mac, we see:在 Mac 上,我们看到:

ipdb> ids.astype(int)
id
1818726176      1818726176  
1881879486      1881879486  
2590366906      2590366906  
284399109       284399109   
299981685       299981685   
370708200       370708200   
387277023371    387277023371
387343898032    387343898032
406885699892    406885699892
5262665206      5262665206  
544687374       544687374   
6978317806      6978317806  

Whereas on a Windows machine (in PowerShell), we see:而在 Windows 机器(在 PowerShell 中)上,我们看到:

ipdb> ids.astype(int)
id
1818726176      1818726176
1881879486      1881879486
2590366906     -1704600390
284399109       284399109 
299981685       299981685 
370708200       370708200 
387277023371    729966731 
387343898032    796841392 
406885699892   -1136193228
5262665206      967697910 
544687374       544687374 
6978317806     -1611616786

Other than using a sed call to change every astype(int) to astype(np.int64) (which would also require an import numpy as np at the top of every module where currently that doesn't exist), is there a way to do this?除了使用sed调用将每个astype(int)更改为astype(np.int64) (这还需要import numpy as np在当前不存在的每个模块的顶部),有没有办法做这个?

In particular, I was hoping to map int to numpy.int64 somehow in a pandas option or something.特别是,我希望 map int 到 numpy.int64 以某种方式在 pandas 选项或其他东西中。

Thank you!谢谢!

I'm not saying that this is a really good idea, but you can simply redefine int to whatever you want:我并不是说这是一个非常好的主意,但您可以简单地将int重新定义为您想要的任何内容:

import numpy as np

x = 2384351503.0
print(np.array(x).astype(int))
#-2147483648

old_int = int
int = np.int64
print(np.array(x).astype(int))
#2384351503

int = old_int
print(np.array(x).astype(int))
#-2147483648

In the case you described I'd, however, strongly prefer to fix the source code instead of redefining standard data types.但是,在您描述的情况下,我强烈希望修复源代码而不是重新定义标准数据类型。 It's a one-time effort and any IDE can do it easyly.这是一次性的工作,任何 IDE 都可以轻松完成。

Numpy is already implicitely imported by pandas, so it doesn't cost any additional time or resources. Numpy已经被 pandas 隐式导入,因此不会花费任何额外的时间或资源。 If you really want to avoid it (for whatever reason), you can use pd.Int64Dtype.type instead of np.int64 (see source ).如果您真的想避免它(无论出于何种原因),您可以使用pd.Int64Dtype.type而不是np.int64 (参见源代码)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM