简体   繁体   English

熊猫groupby语句中各行的条件总和

[英]Conditional sum across rows in pandas groupby statement

I have a dataframe containing weekly sales for different products (a, b, c): 我有一个数据框,其中包含不同产品(a,b,c)的每周销售额:

In[1]
df = pd.DataFrame({'product': list('aaaabbbbcccc'),
               'week': [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4],
               'sales': np.power(2, range(12))})
Out[1]
   product  sales  week
0        a      1     1
1        a      2     2
2        a      4     3
3        a      8     4
4        b     16     1
5        b     32     2
6        b     64     3
7        b    128     4
8        c    256     1
9        c    512     2
10       c   1024     3
11       c   2048     4

I would like to create a new column containing the cumulative sales for the last n weeks, grouped by product. 我想创建一个新列,其中包含过去n周内按产品分组的累计销售额。 Eg for n=2 it should be like last_2_weeks : 例如,对于n=2它应该类似于last_2_weeks

   product  sales  week  last_2_weeks
0        a      1     1             0
1        a      2     2             1
2        a      4     3             3
3        a      8     4             6
4        b     16     1             0
5        b     32     2            16
6        b     64     3            48
7        b    128     4            96
8        c    256     1             0
9        c    512     2           256
10       c   1024     3           768
11       c   2048     4          1536

How can I efficiently calculate such an cumulative, conditional sum in pandas? 我怎样才能有效地计算出这种累积的有条件的熊猫总数? The solution should also work if there are more variables to group by, eg product and location. 如果还有更多变量要分组,例如产品和位置,则该解决方案也应该起作用。

I have tried creating a new function and using groupby and apply , but this works only if rows are sorted. 我尝试创建一个新函数并使用groupbyapply ,但这仅在对行进行排序时有效。 Also it's slow and ugly. 而且它又慢又丑。

def last_n_weeks(x):
    """ calculate sales of previous n weeks in aggregated data """
    n = 2
    cur_week = x['week'].iloc[0]
    cur_prod = x['product'].iloc[0]
    res = np.sum(df['sales'].loc[((df['product'] == cur_prod) &
                         (df['week'] >= cur_week-n) & (df['week'] < cur_week))])
    return res

df['last_2_weeks'] = df.groupby(['product', 'week']).apply(last_n_weeks).reset_index(drop=True)

You could use pd.rolling_sum with window=2 , then shift once and fill NaNs with 0 您可以使用window=2 pd.rolling_sum ,然后shift一次并用0填充NaNs

In [114]: df['l2'] = (df.groupby('product')['sales']
                       .apply(lambda x: pd.rolling_sum(x, window=2, min_periods=0)
                       .shift()
                       .fillna(0)))
In [115]: df
Out[115]:
   product  sales  week    l2
0        a      1     1     0
1        a      2     2     1
2        a      4     3     3
3        a      8     4     6
4        b     16     1     0
5        b     32     2    16
6        b     64     3    48
7        b    128     4    96
8        c    256     1     0
9        c    512     2   256
10       c   1024     3   768
11       c   2048     4  1536

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM