简体   繁体   中英

Fill missing dates with another column value

I've been trying to do something that I thought would be simple, but I'm facing an issue I'm not understanding. I have two columns: date_published and date_obtained , of course, I've got all data for date_obtained , but not for date_published . My approach was to fill the missing date_published with date obtained minus 1 day (might also be the median difference, but I'll ignore that).

df looks like this:

date_published    date_obtained
 2017-12-20        2017-12-22
    NaT            2017-12-23

And should look like this afterwards:

date_published    date_obtained
 2017-12-20        2017-12-22
 2017-12-22        2017-12-23

I tried the following:

date_delta=(Df.date_obtained-datetime.timedelta(days=1))
Df.loc['date_published']=Df.date_published.fillna((date_delta))

But, to my surprise, that didn't fill any NaT , and also created a missing value in all of my data frame. Also tried filling with just Df.date_obtained , but the result was the same. What am I missing?

You were almost there, you should've either done:

u = df.date_obtained - pd.Timedelta(days=1)
df['date_published'] = df.date_published.fillna(u)

Or,

df.loc[:, 'date_published'] = df.date_published.fillna(u)

Using loc to refer to the columns (otherwise, it will try to check the index for the label you passed).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM