[英]Casting NaN into int in a pandas Series
I have missing values in a column of a series, so the command dataframe.colname.astype("int64")
yields an error. 我在一系列列中缺少值,因此命令
dataframe.colname.astype("int64")
产生错误。
Any workarounds? 任何解决方法?
The datatype or dtype
of a pd.Series
has very little impact on the actual way it is used. 数据类型或
dtype
A的pd.Series
上有实际使用的方式的影响非常小。
You can have a pd.Series
with integers, and set the dtype
to be object
. 你可以有一个
pd.Series
与整数,并设置dtype
为object
。 You can still do the same things with the pd.Series
. 您仍然可以使用
pd.Series
进行相同的pd.Series
。
However, if you manually set dtypes
of pd.Series
, pandas will start to cast the entries inside the pd.Series
. 但是,如果您手动设置
dtypes
的pd.Series
,pandas将开始在pd.Series
内部pd.Series
条目。 In my experience, this only leads to confusion. 以我的经验,这只会导致混乱。
Do not try to use dtypes
as field types in relational databases. 不要在关系数据库中尝试使用
dtypes
作为字段类型。 They are not the same thing. 它们不是同一件事。
If you want to have integes and NaN
s/ None
s mixed in a pd.Series
, just set the dtype to object
. 如果要在pd.Series中混合整数和
NaN
/ None
,只需将pd.Series
设置为object
。
Settings the dtype
to float
will let you have float
representations of int
s and NaN
s mixed. 将
dtype
设置为float
将使您可以混合使用int
和NaN
的float
表示形式。 But remember that float
s are prone to be unexact in their representation 但是请记住,
float
的表示形式很可能不准确
One common pitfall with dtypes
which I should mention is the pd.merge
operation, which will silently refuse to join when the keys used has different dtypes
, for example int
vs object
even if the object
only contains int
s. 一个常见的错误
dtypes
我应该提到的是pd.merge
操作,这会悄悄地拒绝加入时使用的键有不同的dtypes
,例如int
VS object
,即使object
仅包含int
秒。
Other workarounds 其他解决方法
Series.fillna
method to fill your NaN
values with something unlikely. Series.fillna
方法用不太可能的值填充NaN
值。 0
or -1
. 0
或-1
。 NaN
s to a new column df['was_nan'] = pd.isnull(df['floatcol'])
, then use the Series.fillna
method . NaN
复制到新列df['was_nan'] = pd.isnull(df['floatcol'])
, 然后使用Series.fillna
方法 。 This way you do not lose any information. Series.astype()
method, give it the keyword argument raise_on_error=False
, and just use the current dtype
if it fails. Series.astype()
方法,给它的关键字参数raise_on_error=False
,只是使用当前的dtype
,如果它失败。 Because dtypes
do not matter that much. dtypes
没什么大不了的。 TLDR; TLDR;
Don't focus on having the 'right dtype', dtypes are strange. 不要专注于拥有“正确的dtype”,dtypes很奇怪。 Focus on what you want the column to actually do.
专注于您希望该列实际执行的操作。
dtype=object
is fine. dtype=object
很好。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.