简体   繁体   中英

Reindexing python data frame is creating NaN values

I have a data frame that looks like this, with monthly data points:

   Date        Value
1  2010-01-01  18.45
2  2010-02-01  18.13
3  2010-03-01  18.25
4  2010-04-01  17.92
5  2010-05-01  18.85 

I want to make it daily data and fill in the resulting new dates with the current month value. For example:

   Date        Value
1  2010-01-01  18.45
2  2010-01-02  18.45
3  2010-01-03  18.45
4  2010-01-04  18.45
5  2010-01-05  18.45 
....

This is the code I'm using to add the interim dates and fill the values:

today = get_datetime('US/Eastern') #.strftime('%Y-%m-%d')
enddate='1881-01-01'
idx = pd.date_range(enddate, today.strftime('%Y-%m-%d'), freq='D')
df = df.reindex(idx)
df = df.fillna(method = 'ffill')

The output is as follows:

                     Date   Value
2010-01-01 00:00:00  NaN    NaN
2010-01-02 00:00:00  NaN    NaN
2010-01-03 00:00:00  NaN    NaN
2010-01-04 00:00:00  NaN    NaN
2010-01-05 00:00:00  NaN    NaN 

The logs show that the NaN values appear just before the .fillna method is invoked. So the forward fill is not the culprit.

Any ideas why this is happening?

option 3
safest approach, very general
up-sample to daily, then group monthly with a transform

The reason why this is important is that your day may not fall on the first of the month. If you want to ensure that that days value gets broadcast for every other day in the month, do this

df.set_index('Date').asfreq('D') \
    .groupby(pd.TimeGrouper('M')).Value \
    .transform('first').reset_index()

option 2
asfreq

df.set_index('Date').asfreq('D').ffill().reset_index()

option 3
resample

df.set_index('Date').resample('D').first().ffill().reset_index()

For pandas=0.16.1

df.set_index('Date').resample('D').ffill().reset_index()

All produce the same result over this sample data set

在此处输入图片说明

you need to add index to the original dataframe before calling reindex

test = pd.DataFrame(np.random.randn(4), index=pd.date_range('2017-01-01', '2017-01-04'), columns=['test'])
test.reindex(pd.date_range('2017-01-01', '2017-01-05'), method='ffill')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM