简体   繁体   中英

How to perform weighted average every 2 rows in Pandas?

My data looks like this:

...
                     A         B      C
2017-09-18 12:00:00  1.000010  18000  100
2017-09-18 17:00:00  1.000029  13500  400
2017-09-19 12:00:00  1.000025  18000  300
2017-09-19 17:00:00  1.000037  13500  300

...

At 2 distinct times on the same day measures A, B, and C are taken.

I need to collapse every 2 measures/day into a single row (for example, for the first 2 rows):

  • a weighted average of columns A and B

    ((A1 * B1) + (A2 * B2)) / (B1 + B2)

  • an average of column C

    (C1 + C2) / 2

My difficulty arises in trying to df.groupby these adjacent rows, given they have distinct times and the need to perform a custom operation for columns AB, that is different from C.

My expected output would be:

                     A            C
2017-09-18 12:00:00  1.000018143  250
2017-09-19 12:00:00  1.000030143  300

Any pointers would be greatly appreciated.

Check with

df.groupby(df.index.date).apply(lambda x : pd.Series({'A':sum(x['A']*x['B'])/sum(x['B']),'C':(x['C']).mean()}))
                   A      C
2017-09-18  1.000018  250.0
2017-09-19  1.000030  300.0

Or let us do not use apply

t1=df.eval('A*B').groupby(df.index.date).sum()/df.groupby(df.index.date).B.sum()
t2=df.groupby(df.index.date).C.mean()

pd.concat([t1,t2],1)
                   0    C
2017-09-18  1.000018  250
2017-09-19  1.000030  300

You can vectorize this with groupby , apply , and mean :

def AB_weighted(g):
   return (g['A'] * g['B']).sum() / g['B'].sum()

g = df.groupby(df.index.date)
pd.concat([g.apply(AB_weighted), g['C'].mean()], keys=['A', 'C'], axis=1)

                   A    C
2017-09-18  1.000018  250
2017-09-19  1.000030  300
  • We need apply for the first condition, since the groupby calculation uses multiple columns—"A" and "B".
  • For calculating the mean of "C", only "C" is needed, so we can shorten things with mean() .

Another option is computing the product before the groupby , so we can circumvent the call to apply (this is a little like @WB second answer) but with one sum call.

u = df.assign(D=df['A'] * df['B'])[['D', 'B']].groupby(df.index.date).sum()
u['A'] = u.pop('D') / u.pop('B')

u['C'] = df.groupby(df.index.date)['C'].mean()

u
                   A    C
2017-09-18  1.000018  250
2017-09-19  1.000030  300

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM