简体   繁体   English

在熊猫系列中将NaN转换为int

[英]Casting NaN into int in a pandas Series

I have missing values in a column of a series, so the command dataframe.colname.astype("int64") yields an error. 我在一系列列中缺少值,因此命令dataframe.colname.astype("int64")产生错误。

Any workarounds? 任何解决方法?

The datatype or dtype of a pd.Series has very little impact on the actual way it is used. 数据类型或dtype A的pd.Series上有实际使用的方式的影响非常小。

You can have a pd.Series with integers, and set the dtype to be object . 你可以有一个pd.Series与整数,并设置dtypeobject You can still do the same things with the pd.Series . 您仍然可以使用pd.Series进行相同的pd.Series

However, if you manually set dtypes of pd.Series , pandas will start to cast the entries inside the pd.Series . 但是,如果您手动设置dtypespd.Series ,pandas将开始在pd.Series内部pd.Series条目。 In my experience, this only leads to confusion. 以我的经验,这只会导致混乱。

Do not try to use dtypes as field types in relational databases. 不要在关系数据库中尝试使用dtypes作为字段类型。 They are not the same thing. 它们不是同一件事。

If you want to have integes and NaN s/ None s mixed in a pd.Series , just set the dtype to object . 如果要在pd.Series中混合整数和NaN / None ,只需将pd.Series设置为object

Settings the dtype to float will let you have float representations of int s and NaN s mixed. dtype设置为float将使您可以混合使用intNaNfloat表示形式。 But remember that float s are prone to be unexact in their representation 但是请记住, float的表示形式很可能不准确

One common pitfall with dtypes which I should mention is the pd.merge operation, which will silently refuse to join when the keys used has different dtypes , for example int vs object even if the object only contains int s. 一个常见的错误dtypes我应该提到的是pd.merge操作,这会悄悄地拒绝加入时使用的键有不同的dtypes ,例如int VS object ,即使object仅包含int秒。

Other workarounds 其他解决方法

  1. You can use the Series.fillna method to fill your NaN values with something unlikely. 您可以使用Series.fillna方法用不太可能的值填充NaN值。 0 or -1 . 0-1
  2. Copy the NaN s to a new column df['was_nan'] = pd.isnull(df['floatcol']) , then use the Series.fillna method . NaN复制到新列df['was_nan'] = pd.isnull(df['floatcol'])然后使用Series.fillna方法 This way you do not lose any information. 这样您就不会丢失任何信息。
  3. When calling the Series.astype() method, give it the keyword argument raise_on_error=False , and just use the current dtype if it fails. 当调用Series.astype()方法,给它的关键字参数raise_on_error=False ,只是使用当前的dtype ,如果它失败。 Because dtypes do not matter that much. 因为dtypes没什么大不了的。

TLDR; TLDR;

Don't focus on having the 'right dtype', dtypes are strange. 不要专注于拥有“正确的dtype”,dtypes很奇怪。 Focus on what you want the column to actually do. 专注于您希望该列实际执行的操作。 dtype=object is fine. dtype=object很好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM