简体   繁体   中英

Conditional sum across rows in pandas groupby statement

I have a dataframe containing weekly sales for different products (a, b, c):

In[1]
df = pd.DataFrame({'product': list('aaaabbbbcccc'),
               'week': [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4],
               'sales': np.power(2, range(12))})
Out[1]
   product  sales  week
0        a      1     1
1        a      2     2
2        a      4     3
3        a      8     4
4        b     16     1
5        b     32     2
6        b     64     3
7        b    128     4
8        c    256     1
9        c    512     2
10       c   1024     3
11       c   2048     4

I would like to create a new column containing the cumulative sales for the last n weeks, grouped by product. Eg for n=2 it should be like last_2_weeks :

   product  sales  week  last_2_weeks
0        a      1     1             0
1        a      2     2             1
2        a      4     3             3
3        a      8     4             6
4        b     16     1             0
5        b     32     2            16
6        b     64     3            48
7        b    128     4            96
8        c    256     1             0
9        c    512     2           256
10       c   1024     3           768
11       c   2048     4          1536

How can I efficiently calculate such an cumulative, conditional sum in pandas? The solution should also work if there are more variables to group by, eg product and location.

I have tried creating a new function and using groupby and apply , but this works only if rows are sorted. Also it's slow and ugly.

def last_n_weeks(x):
    """ calculate sales of previous n weeks in aggregated data """
    n = 2
    cur_week = x['week'].iloc[0]
    cur_prod = x['product'].iloc[0]
    res = np.sum(df['sales'].loc[((df['product'] == cur_prod) &
                         (df['week'] >= cur_week-n) & (df['week'] < cur_week))])
    return res

df['last_2_weeks'] = df.groupby(['product', 'week']).apply(last_n_weeks).reset_index(drop=True)

You could use pd.rolling_sum with window=2 , then shift once and fill NaNs with 0

In [114]: df['l2'] = (df.groupby('product')['sales']
                       .apply(lambda x: pd.rolling_sum(x, window=2, min_periods=0)
                       .shift()
                       .fillna(0)))
In [115]: df
Out[115]:
   product  sales  week    l2
0        a      1     1     0
1        a      2     2     1
2        a      4     3     3
3        a      8     4     6
4        b     16     1     0
5        b     32     2    16
6        b     64     3    48
7        b    128     4    96
8        c    256     1     0
9        c    512     2   256
10       c   1024     3   768
11       c   2048     4  1536

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM