[英]Working Around the Windows-numpy astype(int) Bug in Pandas
I have a codebase I've been developing on a Mac (and running on Linux machines) based largely on pandas (and therefore numpy).我有一个代码库,我一直在 Mac 上开发(并在 Linux 机器上运行),主要基于 pandas(因此是 numpy)。 Very commonly I type-cast with
astype(int)
.我通常使用
astype(int)
进行类型转换。
Recently a Windows-based developer joined our team.最近,一位基于 Windows 的开发人员加入了我们的团队。 In an effort to make the code base more platform-independent, we're trying to gracefully tackle this tricky issue whereby numpy uses a 32-bit type instead of the 64-bit type, which breaks longer integers.
为了使代码库更加独立于平台,我们试图优雅地解决这个棘手的问题,即 numpy 使用 32 位类型而不是 64 位类型,这会破坏更长的整数。
On a Mac, we see:在 Mac 上,我们看到:
ipdb> ids.astype(int)
id
1818726176 1818726176
1881879486 1881879486
2590366906 2590366906
284399109 284399109
299981685 299981685
370708200 370708200
387277023371 387277023371
387343898032 387343898032
406885699892 406885699892
5262665206 5262665206
544687374 544687374
6978317806 6978317806
Whereas on a Windows machine (in PowerShell), we see:而在 Windows 机器(在 PowerShell 中)上,我们看到:
ipdb> ids.astype(int)
id
1818726176 1818726176
1881879486 1881879486
2590366906 -1704600390
284399109 284399109
299981685 299981685
370708200 370708200
387277023371 729966731
387343898032 796841392
406885699892 -1136193228
5262665206 967697910
544687374 544687374
6978317806 -1611616786
Other than using a sed
call to change every astype(int)
to astype(np.int64)
(which would also require an import numpy as np
at the top of every module where currently that doesn't exist), is there a way to do this?除了使用
sed
调用将每个astype(int)
更改为astype(np.int64)
(这还需要import numpy as np
在当前不存在的每个模块的顶部),有没有办法做这个?
In particular, I was hoping to map int to numpy.int64 somehow in a pandas option or something.特别是,我希望 map int 到 numpy.int64 以某种方式在 pandas 选项或其他东西中。
Thank you!谢谢!
I'm not saying that this is a really good idea, but you can simply redefine int
to whatever you want:我并不是说这是一个非常好的主意,但您可以简单地将
int
重新定义为您想要的任何内容:
import numpy as np
x = 2384351503.0
print(np.array(x).astype(int))
#-2147483648
old_int = int
int = np.int64
print(np.array(x).astype(int))
#2384351503
int = old_int
print(np.array(x).astype(int))
#-2147483648
In the case you described I'd, however, strongly prefer to fix the source code instead of redefining standard data types.但是,在您描述的情况下,我强烈希望修复源代码而不是重新定义标准数据类型。 It's a one-time effort and any IDE can do it easyly.
这是一次性的工作,任何 IDE 都可以轻松完成。
Numpy
is already implicitely imported by pandas, so it doesn't cost any additional time or resources. Numpy
已经被 pandas 隐式导入,因此不会花费任何额外的时间或资源。 If you really want to avoid it (for whatever reason), you can use pd.Int64Dtype.type
instead of np.int64
(see source ).如果您真的想避免它(无论出于何种原因),您可以使用
pd.Int64Dtype.type
而不是np.int64
(参见源代码)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.