I get ValueError: cannot convert float NaN to integer for following:
df = pandas.read_csv('zoom11.csv')
df[['x']] = df[['x']].astype(int)
Update: Using the hints in comments/answers I got my data clean with this:
# x contained NaN
df = df[~df['x'].isnull()]
# Y contained some other garbage, so null check was not enough
df = df[df['y'].str.isnumeric()]
# final conversion now worked
df[['x']] = df[['x']].astype(int)
df[['y']] = df[['y']].astype(int)
For identifying NaN
values use boolean indexing
:
print(df[df['x'].isnull()])
Then for removing all non-numeric values use to_numeric
with parameter errors='coerce'
- to replace non-numeric values to NaN
s:
df['x'] = pd.to_numeric(df['x'], errors='coerce')
And for remove all rows with NaN
s in column x
use dropna
:
df = df.dropna(subset=['x'])
Last convert values to int
s:
df['x'] = df['x'].astype(int)
ValueError: cannot convert float NaN to integer
From v0.24, you actually can. Pandas introduces Nullable Integer Data Types which allows integers to coexist with NaNs.
Given a series of whole float numbers with missing data,
s = pd.Series([1.0, 2.0, np.nan, 4.0])
s
0 1.0
1 2.0
2 NaN
3 4.0
dtype: float64
s.dtype
# dtype('float64')
You can convert it to a nullable int type (choose from one of Int16
, Int32
, or Int64
) with,
s2 = s.astype('Int32') # note the 'I' is uppercase
s2
0 1
1 2
2 NaN
3 4
dtype: Int32
s2.dtype
# Int32Dtype()
Your column needs to have whole numbers for the cast to happen. Anything else will raise a TypeError:
s = pd.Series([1.1, 2.0, np.nan, 4.0])
s.astype('Int32')
# TypeError: cannot safely cast non-equivalent float64 to int32
Also, even at the lastest versions of pandas if the column is object type you would have to convert into float first, something like:
df['column_name'].astype(np.float).astype("Int32")
NB: You have to go through numpy float first and then to nullable Int32, for some reason.
The size of the int if it's 32 or 64 depends on your variable, be aware you may loose some precision if your numbers are to big for the format.
I know this has been answered but wanted to provide alternate solution for anyone in the future:
You can use .loc
to subset the dataframe by only values that are notnull()
, and then subset out the 'x'
column only. Take that same vector, and apply(int)
to it.
If column x is float:
df.loc[df['x'].notnull(), 'x'] = df.loc[df['x'].notnull(), 'x'].apply(int)
如果你有空值那么在做数学运算你会得到这个错误来解决它使用df[~df['x'].isnull()]df[['x']].astype(int)
如果你想要你的数据集不可更改。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.