简体   繁体   English

错误:在pandas中无法将float NaN转换为整数

[英]Error:cannot convert float NaN to integer in pandas

I have the dataframe: 我有数据帧:

   a            b     c      d
0 nan           Y     nan   nan
1  1.27838e+06  N      3     96
2 nan           N      2    nan
3  284633       Y     nan    44

I try to change the data which is non zero to interger type to avoid exponential data(1.27838e+06): 我尝试将非零的数据更改为整数类型以避免指数数据(1.27838e + 06):

f=lambda x : int(x)
df['a']=np.where(df['a']==None,np.nan,df['a'].apply(f))

But I get error also event thought I wish to change the dtype of not null value, anyone can point out my error? 但我得到错误也事件认为我希望更改非null值的dtype,任何人都可以指出我的错误? thanks 谢谢

Pandas doesn't have the ability to store NaN values for integers . Pandas无法存储整数的NaN值 Strictly speaking, you could have a column with mixed data types, but this can be computationally inefficient. 严格地说,您可以使用具有混合数据类型的列,但这可能在计算上效率低下。 So if you insist, you can do 所以,如果你坚持,你可以做到

df['a'] = df['a'].astype('O')
df.loc[df['a'].notnull(), 'a'] = df.loc[df['a'].notnull(), 'a'].astype(int)

As far as I have read in the pandas documentation , it is not possible to represent an integer NaN : 据我在pandas文档中读到,无法表示整数NaN

"In the absence of high performance NA support being built into NumPy from the ground up, the primary casualty is the ability to represent NAs in integer arrays." “由于没有从头开始构建NumPy的高性能NA支持,主要的伤亡是能够在整数数组中表示NA。”

As it is explained later, it is due to memory and performance reasons, and also so that the resulting Series continues to be “numeric”. 正如后面所解释的那样,这是由于内存和性能原因,以及最终的系列仍然是“数字”。 One possibility is to use dtype=object arrays instead. 一种可能性是使用dtype=object数组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM