I've been trying to do something that I thought would be simple, but I'm facing an issue I'm not understanding. I have two columns: date_published
and date_obtained
, of course, I've got all data for date_obtained
, but not for date_published
. My approach was to fill the missing date_published
with date obtained minus 1 day (might also be the median difference, but I'll ignore that).
df looks like this:
date_published date_obtained
2017-12-20 2017-12-22
NaT 2017-12-23
And should look like this afterwards:
date_published date_obtained
2017-12-20 2017-12-22
2017-12-22 2017-12-23
I tried the following:
date_delta=(Df.date_obtained-datetime.timedelta(days=1))
Df.loc['date_published']=Df.date_published.fillna((date_delta))
But, to my surprise, that didn't fill any NaT
, and also created a missing value in all of my data frame. Also tried filling with just Df.date_obtained
, but the result was the same. What am I missing?
You were almost there, you should've either done:
u = df.date_obtained - pd.Timedelta(days=1)
df['date_published'] = df.date_published.fillna(u)
Or,
df.loc[:, 'date_published'] = df.date_published.fillna(u)
Using loc
to refer to the columns (otherwise, it will try to check the index for the label you passed).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.