简体   繁体   English

在 pandas 中查询关于.astype() function

[英]Query regarding .astype() function in pandas

I am currently learning from an online course where I was told that for the .astype() function to be used no NaN(null) values must be present.我目前正在从在线课程中学习,我被告知要使用.astype() function 必须不存在 NaN(null) 值。 However, while typing a program I was careless and did not check for NaN values and used the astype() function.但是,在键入程序时,我很粗心,没有检查 NaN 值,而是使用了astype() function。 It was an object before and I converted it to boolean, later realizing I had NaN values.之前是 object,我将其转换为 boolean,后来意识到我有 NaN 值。 However, no errors were raised and upon calling .info() on the panda object , it returned no null values on the column.但是,没有引发错误,并且在对panda object调用.info()时,它在列上没有返回 null 值。 Please explain.请解释。 I have attached images of this behavior.我附上了这种行为的图片。

.astype can be dangerous. .astype可能很危险。 I suggest you only use it for str or 'O' conversions.我建议您仅将其用于str'O'转换。 For numerics there are dedicated pd.to_numeric , pd.to_datetime or pd.to_timedelta methods.对于数字,有专用的pd.to_numericpd.to_datetimepd.to_timedelta方法。 Sadly, bools don't have an equivalent method.可悲的是,布尔没有等效的方法。

.astype throws an error if you try to convert something that cannot be converted.如果您尝试转换无法转换的内容, .astype会引发错误。 Here NaN is a float, which cannot fit into an integer container.这里的NaN是一个浮点数,不能放入 integer 容器中。

pd.Series(np.NaN).astype(int)
#ValueError: Cannot convert non-finite values (NA or inf) to integer

But then there's bool , and while .astype is doing nothing incorrect, it's probably not doing what you want it to do.但是还有bool ,虽然.astype没有做任何不正确的事情,但它可能没有做你想做的事情。 The issue is that bool(np.NaN) is perfectly well defined.问题是bool(np.NaN)的定义非常明确。

bool(np.NaN)
#True

So, .astype has no issues converting np.NaN to True when you use it.因此, .astype在使用时将np.NaN转换为True没有问题。

pd.Series([True, np.NaN, False]).astype(bool)
#0     True
#1     True  <- NaN became True. Did you really want that?
#2    False
#dtype: bool 

Currently, there is no nullable Bool type, so you can't have a bool dtype with NaN .目前,没有可以为空的 Bool 类型,因此您不能拥有带有NaN的 bool dtype。 You either need to use an object column and where after the .astype您需要使用 object 列以及where之后的.astype

s = pd.Series([True, np.NaN, False])
s.astype(bool).astype('O').where(s.notnull())
#0     True
#1      NaN
#2    False
#dtype: object

Or you could try the Int64 dtype或者你可以试试 Int64 dtype

s = pd.Series([True, np.NaN, False])
s.astype(bool).astype('Int64').where(s.notnull())
#0      1
#1    NaN
#2      0
#dtype: Int64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM