How to perform weighted average every 2 rows in Pandas?

Question

My data looks like this:

...
                     A         B      C
2017-09-18 12:00:00  1.000010  18000  100
2017-09-18 17:00:00  1.000029  13500  400
2017-09-19 12:00:00  1.000025  18000  300
2017-09-19 17:00:00  1.000037  13500  300

...

At 2 distinct times on the same day measures A, B, and C are taken.

I need to collapse every 2 measures/day into a single row (for example, for the first 2 rows):

a weighted average of columns A and B
((A1 * B1) + (A2 * B2)) / (B1 + B2)
an average of column C
(C1 + C2) / 2

My difficulty arises in trying to df.groupby these adjacent rows, given they have distinct times and the need to perform a custom operation for columns AB, that is different from C.

My expected output would be:

                     A            C
2017-09-18 12:00:00  1.000018143  250
2017-09-19 12:00:00  1.000030143  300

Any pointers would be greatly appreciated.

Answer 1

Check with

df.groupby(df.index.date).apply(lambda x : pd.Series({'A':sum(x['A']*x['B'])/sum(x['B']),'C':(x['C']).mean()}))
                   A      C
2017-09-18  1.000018  250.0
2017-09-19  1.000030  300.0

Or let us do not use apply

t1=df.eval('A*B').groupby(df.index.date).sum()/df.groupby(df.index.date).B.sum()
t2=df.groupby(df.index.date).C.mean()

pd.concat([t1,t2],1)
                   0    C
2017-09-18  1.000018  250
2017-09-19  1.000030  300

Answer 2

You can vectorize this with groupby , apply , and mean :

def AB_weighted(g):
   return (g['A'] * g['B']).sum() / g['B'].sum()

g = df.groupby(df.index.date)
pd.concat([g.apply(AB_weighted), g['C'].mean()], keys=['A', 'C'], axis=1)

                   A    C
2017-09-18  1.000018  250
2017-09-19  1.000030  300

We need apply for the first condition, since the groupby calculation uses multiple columns—"A" and "B".
For calculating the mean of "C", only "C" is needed, so we can shorten things with mean() .

Another option is computing the product before the groupby , so we can circumvent the call to apply (this is a little like @WB second answer) but with one sum call.

u = df.assign(D=df['A'] * df['B'])[['D', 'B']].groupby(df.index.date).sum()
u['A'] = u.pop('D') / u.pop('B')

u['C'] = df.groupby(df.index.date)['C'].mean()

u
                   A    C
2017-09-18  1.000018  250
2017-09-19  1.000030  300

How to perform weighted average every 2 rows in Pandas?

Question

2 answers

solution1
4 2019-01-19 21:20:12

solution2
4 ACCPTED 2019-01-19 21:21:22

How to perform weighted average every 2 rows in Pandas?

Question

2 answers

solution1 4 2019-01-19 21:20:12

solution2 4 ACCPTED 2019-01-19 21:21:22

solution1
4 2019-01-19 21:20:12

solution2
4 ACCPTED 2019-01-19 21:21:22