熊猫groupby语句中各行的条件总和

Question

I have a dataframe containing weekly sales for different products (a, b, c): 我有一个数据框，其中包含不同产品（a，b，c）的每周销售额：

In[1]
df = pd.DataFrame({'product': list('aaaabbbbcccc'),
               'week': [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4],
               'sales': np.power(2, range(12))})
Out[1]
   product  sales  week
0        a      1     1
1        a      2     2
2        a      4     3
3        a      8     4
4        b     16     1
5        b     32     2
6        b     64     3
7        b    128     4
8        c    256     1
9        c    512     2
10       c   1024     3
11       c   2048     4

I would like to create a new column containing the cumulative sales for the last n weeks, grouped by product. 我想创建一个新列，其中包含过去n周内按产品分组的累计销售额。 Eg for n=2 it should be like last_2_weeks : 例如，对于n=2它应该类似于last_2_weeks ：

   product  sales  week  last_2_weeks
0        a      1     1             0
1        a      2     2             1
2        a      4     3             3
3        a      8     4             6
4        b     16     1             0
5        b     32     2            16
6        b     64     3            48
7        b    128     4            96
8        c    256     1             0
9        c    512     2           256
10       c   1024     3           768
11       c   2048     4          1536

How can I efficiently calculate such an cumulative, conditional sum in pandas? 我怎样才能有效地计算出这种累积的有条件的熊猫总数？ The solution should also work if there are more variables to group by, eg product and location. 如果还有更多变量要分组，例如产品和位置，则该解决方案也应该起作用。

I have tried creating a new function and using groupby and apply , but this works only if rows are sorted. 我尝试创建一个新函数并使用groupby和apply ，但这仅在对行进行排序时有效。 Also it's slow and ugly. 而且它又慢又丑。

def last_n_weeks(x):
    """ calculate sales of previous n weeks in aggregated data """
    n = 2
    cur_week = x['week'].iloc[0]
    cur_prod = x['product'].iloc[0]
    res = np.sum(df['sales'].loc[((df['product'] == cur_prod) &
                         (df['week'] >= cur_week-n) & (df['week'] < cur_week))])
    return res

df['last_2_weeks'] = df.groupby(['product', 'week']).apply(last_n_weeks).reset_index(drop=True)

Answer 1

You could use pd.rolling_sum with window=2 , then shift once and fill NaNs with 0 您可以使用window=2 pd.rolling_sum ，然后shift一次并用0填充NaNs

In [114]: df['l2'] = (df.groupby('product')['sales']
                       .apply(lambda x: pd.rolling_sum(x, window=2, min_periods=0)
                       .shift()
                       .fillna(0)))
In [115]: df
Out[115]:
   product  sales  week    l2
0        a      1     1     0
1        a      2     2     1
2        a      4     3     3
3        a      8     4     6
4        b     16     1     0
5        b     32     2    16
6        b     64     3    48
7        b    128     4    96
8        c    256     1     0
9        c    512     2   256
10       c   1024     3   768
11       c   2048     4  1536

熊猫groupby语句中各行的条件总和

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-10-26 15:02:41

熊猫groupby语句中各行的条件总和

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-10-26 15:02:41

解决方案1
2 已采纳 2015-10-26 15:02:41