[英]Query regarding .astype() function in pandas
I am currently learning from an online course where I was told that for the .astype()
function to be used no NaN(null) values must be present.我目前正在从在线课程中学习,我被告知要使用
.astype()
function 必须不存在 NaN(null) 值。 However, while typing a program I was careless and did not check for NaN values and used the astype()
function.但是,在键入程序时,我很粗心,没有检查 NaN 值,而是使用了
astype()
function。 It was an object before and I converted it to boolean, later realizing I had NaN values.之前是 object,我将其转换为 boolean,后来意识到我有 NaN 值。 However, no errors were raised and upon calling
.info()
on the panda object , it returned no null values on the column.但是,没有引发错误,并且在对panda object调用
.info()
时,它在列上没有返回 null 值。 Please explain.请解释。 I have attached images of this behavior.
我附上了这种行为的图片。
.astype
can be dangerous. .astype
可能很危险。 I suggest you only use it for str
or 'O'
conversions.我建议您仅将其用于
str
或'O'
转换。 For numerics there are dedicated pd.to_numeric
, pd.to_datetime
or pd.to_timedelta
methods.对于数字,有专用的
pd.to_numeric
、 pd.to_datetime
或pd.to_timedelta
方法。 Sadly, bools don't have an equivalent method.可悲的是,布尔没有等效的方法。
.astype
throws an error if you try to convert something that cannot be converted.如果您尝试转换无法转换的内容,
.astype
会引发错误。 Here NaN
is a float, which cannot fit into an integer container.这里的
NaN
是一个浮点数,不能放入 integer 容器中。
pd.Series(np.NaN).astype(int)
#ValueError: Cannot convert non-finite values (NA or inf) to integer
But then there's bool
, and while .astype
is doing nothing incorrect, it's probably not doing what you want it to do.但是还有
bool
,虽然.astype
没有做任何不正确的事情,但它可能没有做你想做的事情。 The issue is that bool(np.NaN)
is perfectly well defined.问题是
bool(np.NaN)
的定义非常明确。
bool(np.NaN)
#True
So, .astype
has no issues converting np.NaN
to True
when you use it.因此,
.astype
在使用时将np.NaN
转换为True
没有问题。
pd.Series([True, np.NaN, False]).astype(bool)
#0 True
#1 True <- NaN became True. Did you really want that?
#2 False
#dtype: bool
Currently, there is no nullable Bool type, so you can't have a bool dtype with NaN
.目前,没有可以为空的 Bool 类型,因此您不能拥有带有
NaN
的 bool dtype。 You either need to use an object column and where
after the .astype
您需要使用 object 列以及
where
之后的.astype
s = pd.Series([True, np.NaN, False])
s.astype(bool).astype('O').where(s.notnull())
#0 True
#1 NaN
#2 False
#dtype: object
Or you could try the Int64 dtype或者你可以试试 Int64 dtype
s = pd.Series([True, np.NaN, False])
s.astype(bool).astype('Int64').where(s.notnull())
#0 1
#1 NaN
#2 0
#dtype: Int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.