简体   繁体   中英

How to fill missing terms with a value exactly 24 intervals ago in pandas

Hi I have a data from the sensors of solar radiation for whole year in hourly resolution. The data for some hours is missing and it needs to be filled with values exactly 24 hours ago. As Solar radiations are almost same next day at the same time.

The sample pic of the data is

在此处输入图片说明

The missing data is as shown below 在此处输入图片说明

The code I learnt for filling suggests something like the following

import pandas as pd

df =pd.read_excel('ffill_test.xlsx')
df['Solar Power'].fillna(method='ffill', inplace= True)

print(df)

How can I get the interval fill with interval of 24 values. The other option I am thinking is to convert it to the list and then use loop to replace it.

Thanks.

Using a simple dataset:

df
      a
0   1.0
1   3.0
2   5.0
3   7.0
4   7.0
5   NaN
6   3.0
7  24.0

We would like to fill np.nan with a previous value:

df.a.fillna(df.a.shift(3))

0     1.0
1     3.0
2     5.0
3     7.0
4     7.0
5     5.0
6     3.0
7    24.0
Name: a, dtype: float64

In your case:

df['Solar Power'].fillna(df['Solar Power'].shift(24))

I think you need create DatetimeIndex created by to_datetime and to_timedelta for fillna by shift ed values by 24H :

print (df)
        Date     Time  System Power  Solar Power
0   6/15/2016  0:00:00           1.0         10.0
1   6/15/2016  0:00:01           2.0         20.0
2   6/15/2016  0:00:02           3.0         30.0
3   6/15/2016  0:00:03           4.0         40.0
4   6/15/2016  0:00:04           5.0         50.0
5   6/15/2016  0:00:05           6.0         60.0
6   6/15/2016  0:00:06           7.0         70.0
7   6/15/2016  0:00:07           8.0         80.0
8   6/15/2016  0:00:08           9.0         90.0
9   6/15/2016  0:00:09          10.0        100.0
10  6/15/2016  0:00:10          11.0        110.0
11  6/16/2016  0:00:04           NaN          NaN
12  6/16/2016  0:00:06           NaN          NaN

df.index = pd.to_datetime(df['Date']) +  pd.to_timedelta(df['Time'].astype(str)) 
cols = ['System Power','Solar Power']
df[cols] = df[cols].fillna(df[cols].shift(24, freq='H'))
df = df.reset_index(drop=True)
print (df)
         Date     Time  System Power  Solar Power
0   6/15/2016  0:00:00           1.0         10.0
1   6/15/2016  0:00:01           2.0         20.0
2   6/15/2016  0:00:02           3.0         30.0
3   6/15/2016  0:00:03           4.0         40.0
4   6/15/2016  0:00:04           5.0         50.0
5   6/15/2016  0:00:05           6.0         60.0
6   6/15/2016  0:00:06           7.0         70.0
7   6/15/2016  0:00:07           8.0         80.0
8   6/15/2016  0:00:08           9.0         90.0
9   6/15/2016  0:00:09          10.0        100.0
10  6/15/2016  0:00:10          11.0        110.0
11  6/16/2016  0:00:04           5.0         50.0
12  6/16/2016  0:00:06           7.0         70.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM