When using df.fillna()
, which value/function does it use to determine if a value is NaN
? NaT
, for instance, does not get filled but pd.isnull()
captures that.
Furthermore, is there a way to parse a function to fillna
which determines if a value is NaN
or not eg
df.fillna(na_function = pd.isnull,value= np.nan)
EDIT (added example):
df=pd.DataFrame(
[[0,"2018-02-10",np.nan],
[None,NaT,0]])
df.isnull()
#[[False,False,True]
#[True,True,False]]
#
df.fillna(np.nan,inplace=True)
#[[0,"2018-02-10",np.nan]
#[np.nan,NaT,0]]
#
I want it to fill all NaN/Null values where pd.isnull()==True
including NaT
.
There is indeed a light inconsistency here. isna
tests for any null value (None, NaN or NaT), while fillna
only processes NaN. One could argue that it is a feature, because you can choose what version you want.
BTW, filling all null values can be easily done using isna
:
df[df.isna()] = replacement_value
The actual reason is probably that isna
is an alias for isnull
.
Assuming you are having NaN
and NaT
values in the dataframe, you can always check the dtypes
and fill them separately. Like this:
x = df.select_dtypes(exclude=['datetime'])
df[x.columns] = x.fillna(99)
x = df.select_dtypes(include=['datetime'])
df[x.columns] = x.fillna(pd.to_datetime('today'))
Taking your sample df
as example:
In [1997]: df
Out[1997]:
0 1 2
0 0.00 2018-02-10 nan
1 nan NaT 0.00
In [1998]: df.dtypes
Out[1998]:
0 float64
1 datetime64[ns]
2 float64
In [1999]: x = df.select_dtypes(exclude=['datetime'])
In [2000]: df[x.columns] = x.fillna(99)
In [2001]: df
Out[2001]:
0 1 2
0 0.00 2018-02-10 99.00
1 99.00 NaT 0.00
In [2002]: x = df.select_dtypes(include=['datetime'])
In [2003]: df[x.columns] = x.fillna(pd.to_datetime('today'))
In [2004]: df
Out[2004]:
0 1 2
0 0.00 2018-02-10 00:00:00.000000 99.00
1 99.00 2020-06-08 12:42:18.819089 0.00
Create dictionary for replace, like here datetimes, strings and all another values in DataFrame.fillna
:
df=pd.DataFrame(
[[0,"2018-02-10",np.nan, 'a'],
[None,pd.NaT,0, None]])
print (df)
0 1 2 3
0 0.0 2018-02-10 NaN a
1 NaN NaT 0.0 None
dates = df.select_dtypes(['datetime']).columns
strings = df.select_dtypes(['object']).columns
d1 = dict.fromkeys(dates, pd.Timestamp('2000-01-01'))
d2 = dict.fromkeys(strings, 'b')
d3 = dict.fromkeys(df.columns.difference(dates.union(strings)), 1)
#https://stackoverflow.com/a/26853961
d = {**d1, **d2, **d3}
df = df.fillna(d)
print (df)
0 1 2 3
0 0.0 2018-02-10 1.0 a
1 1.0 2000-01-01 0.0 b
Detail :
print (d)
{1: Timestamp('2000-01-01 00:00:00'), 3: 'b', 0: 1, 2: 1, 4: 1}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.