简体   繁体   中英

Pandas: how to fill missing diff value after groupby?

I group a set of data with Pandas.groupby .

Then I want to calculate the delta value that's given by the difference between the first element of each group to the first element of the next one:

import pandas as pd
import numpy as np
import datetime

# dummy set of data
np.random.seed(0)
SIZE = 25
dt = pd.date_range(datetime.datetime(2018,1,1,10,14,30), freq="1Min", periods=SIZE)
wa = (np.random.random(size=SIZE)-.4)*1000
wa = np.cumsum(wa) + 10000
df = pd.DataFrame({"dtime":dt, "value":wa}).set_index("dtime")

grouped = df.groupby(pd.Grouper(freq='15Min'))

print("\n************\nDataframe:")
print df

print("\n************\nFirst value of each group:")
print(grouped.first())

print("\n************\nDelta value:")
print(-grouped.first().diff(periods=-1))  

This code returns:

************
Dataframe:
                            value
dtime                            
2018-01-01 10:14:30  10148.813504
2018-01-01 10:15:30  10464.002870
2018-01-01 10:16:30  10666.766246
2018-01-01 10:17:30  10811.649429
2018-01-01 10:18:30  10835.304229
....
2018-01-01 10:34:30  14209.714833
2018-01-01 10:35:30  14608.873397
2018-01-01 10:36:30  14670.352759
2018-01-01 10:37:30  15050.881935
2018-01-01 10:38:30  14769.156361

************
First value of each group:
                            value
dtime                            
2018-01-01 10:00:00  10148.813504
2018-01-01 10:15:00  10464.002870
2018-01-01 10:30:00  12350.307746

************
Delta value:
                           value
dtime                           
2018-01-01 10:00:00   315.189366
2018-01-01 10:15:00  1886.304875
2018-01-01 10:30:00          NaN   <===== Here is my problem

Of course the last delta is NaN because there is no next group to calculate the diff against.

Is there a way to fill this NaN with the difference between the first and the last values of the last group ?

Being the last row 2018-01-01 10:38:30 14769.156361 , it should return:

************
Delta value:
                           value
dtime                           
2018-01-01 10:00:00   315.189366
2018-01-01 10:15:00  1886.304875
2018-01-01 10:30:00  2418.848615

Thank you

Of course as soon as I asked I found the answer. I leave it here for other users. Quite obvious...

delta = -grouped.first().diff(periods=-1)
delta = delta.fillna(grouped.last() - grouped.first())

print(delta)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM