[英]Pandas: ValueError: cannot convert float NaN to integer
I get ValueError: cannot convert float NaN to integer for following:我得到ValueError: cannot convert float NaN to integer for following:
df = pandas.read_csv('zoom11.csv')
df[['x']] = df[['x']].astype(int)
Update: Using the hints in comments/answers I got my data clean with this:更新:使用评论/答案中的提示,我用这个清理了我的数据:
# x contained NaN
df = df[~df['x'].isnull()]
# Y contained some other garbage, so null check was not enough
df = df[df['y'].str.isnumeric()]
# final conversion now worked
df[['x']] = df[['x']].astype(int)
df[['y']] = df[['y']].astype(int)
For identifying NaN
values use boolean indexing
:要识别
NaN
值,请使用boolean indexing
:
print(df[df['x'].isnull()])
Then for removing all non-numeric values use to_numeric
with parameter errors='coerce'
- to replace non-numeric values to NaN
s:然后删除所有非数字值使用
to_numeric
和参数errors='coerce'
- 将非数字值替换为NaN
s:
df['x'] = pd.to_numeric(df['x'], errors='coerce')
And for remove all rows with NaN
s in column x
use dropna
:要删除
x
列中带有NaN
的所有行,请使用dropna
:
df = df.dropna(subset=['x'])
Last convert values to int
s:最后将值转换为
int
s:
df['x'] = df['x'].astype(int)
ValueError: cannot convert float NaN to integer
ValueError:无法将浮点 NaN 转换为整数
From v0.24, you actually can.从 v0.24 开始,您实际上可以。 Pandas introduces Nullable Integer Data Types which allows integers to coexist with NaNs.
Pandas 引入了Nullable Integer 数据类型,它允许整数与 NaN 共存。
Given a series of whole float numbers with missing data,给定一系列缺失数据的整浮点数,
s = pd.Series([1.0, 2.0, np.nan, 4.0])
s
0 1.0
1 2.0
2 NaN
3 4.0
dtype: float64
s.dtype
# dtype('float64')
You can convert it to a nullable int type (choose from one of Int16
, Int32
, or Int64
) with,您可以将其转换为可为空的 int 类型(从
Int16
、 Int32
或Int64
之一中选择),
s2 = s.astype('Int32') # note the 'I' is uppercase
s2
0 1
1 2
2 NaN
3 4
dtype: Int32
s2.dtype
# Int32Dtype()
Your column needs to have whole numbers for the cast to happen.您的专栏需要有整数才能进行演员表。 Anything else will raise a TypeError:
其他任何事情都会引发 TypeError:
s = pd.Series([1.1, 2.0, np.nan, 4.0])
s.astype('Int32')
# TypeError: cannot safely cast non-equivalent float64 to int32
Also, even at the lastest versions of pandas if the column is object type you would have to convert into float first, something like:此外,即使在最新版本的熊猫中,如果列是对象类型,您也必须先转换为浮点数,例如:
df['column_name'].astype(np.float).astype("Int32")
NB: You have to go through numpy float first and then to nullable Int32, for some reason.注意:出于某种原因,您必须先通过 numpy float 再到可空 Int32。
The size of the int if it's 32 or 64 depends on your variable, be aware you may loose some precision if your numbers are to big for the format. int 的大小(如果是 32 或 64)取决于您的变量,请注意,如果您的数字对于格式来说太大,您可能会失去一些精度。
I know this has been answered but wanted to provide alternate solution for anyone in the future:我知道这已得到解答,但希望将来为任何人提供替代解决方案:
You can use .loc
to subset the dataframe by only values that are notnull()
, and then subset out the 'x'
column only.您可以使用
.loc
仅按notnull()
值对数据帧进行子集化,然后仅对'x'
列进行子集化。 Take that same vector, and apply(int)
to it.取相同的向量,然后对其
apply(int)
。
If column x is float:如果列 x 是浮动的:
df.loc[df['x'].notnull(), 'x'] = df.loc[df['x'].notnull(), 'x'].apply(int)
如果你有空值那么在做数学运算你会得到这个错误来解决它使用df[~df['x'].isnull()]df[['x']].astype(int)
如果你想要你的数据集不可更改。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.