在 Pandas 的多列上使用 groupby 滚动最大值

Question

I am having a dataframe on which I want to compute maximum of rolling previous 3 month's value.我有一个数据框，我想在其中计算前 3 个月滚动值的最大值。 Below is the dataframe:下面是数据框：

VIN   Year_Month    Amount  
  V1    2012-01      196     
  V2    2012-01      113     
  V3    2012-01      177     
  V1    2012-02      154     
  V2    2012-02      129     
  V4    2012-02      156     
  V2    2012-03      100     
  V3    2012-03      174     
  V4    2012-03      127     
  V1    2012-04      139    
  V3    2012-04      194     
  V4    2012-04      178

Using the following piece of code I am trying to compute the max value for each VIN in last 3 month使用以下代码，我试图计算过去 3 个月内每个 VIN的最大值

df['Max_3M_Value'] = df.groupby(['VIN','Year_Month'])['Amount'].rolling(3).max().fillna(0)

But the above code is giving error:但是上面的代码给出了错误：

TypeError: incompatible index of inserted column with frame index

The resultant dataframe I am looking is :我正在寻找的结果数据框是：

  VIN  Year_Month   Amount  Max_3M_Value
   V1   2012-01      196     196
   V2   2012-01      113     129
   V3   2012-01      177     194
   V1   2012-02      154     196
   V2   2012-02      129     129
   V4   2012-02      156     178
   V2   2012-03      100     129
   V3   2012-03      174     194
   V4   2012-03      127     178
   V1   2012-04      139     196
   V3   2012-04      194     194
   V4   2012-04      178     178

I have to compute the max value using rolling only .I know we can solve this using pd.pivot_table() , but that is not needed.我必须仅使用rolling来计算最大值。我知道我们可以使用pd.pivot_table()解决这个问题，但这不是必需的。

What am I missing here.我在这里错过了什么。

Answer 1

You can avoid the error by using only the returned values, but you have to make sure that the order of the rows does not change during the operation:您可以通过仅使用返回值来避免错误，但您必须确保在操作期间行的顺序不会改变：

df['Max_3M_Value'] = df.groupby(['VIN','Year_Month'])['Amount'].rolling(3).max().fillna(0).values

In your given example, your groups are too small (1 element each) to be able to apply a rolling with a window of 3:在您给定的示例中，您的组太小（每个组 1 个元素），无法应用窗口为 3 的滚动：

>>> df.groupby(['VIN','Year_Month'])['Amount'].count()

VIN  Year_Month
V1   2012-01       1
     2012-02       1
     2012-04       1
V2   2012-01       1
     2012-02       1
     2012-03       1
V3   2012-01       1
     2012-03       1
     2012-04       1
V4   2012-02       1
     2012-03       1
     2012-04       1

I believe, what you really want is to group only by 'VIN':我相信，您真正想要的是仅按“VIN”分组：

df['Max_3M_Value'] = df.groupby('VIN')['Amount'].transform(lambda s: s.rolling(3, min_periods=1).max())

output:输出：

   VIN Year_Month  Amount  Max_3M_Value
0   V1    2012-01     196           196
1   V2    2012-01     113           113
2   V3    2012-01     177           177
3   V1    2012-02     154           196
4   V2    2012-02     129           129
5   V4    2012-02     156           156
6   V2    2012-03     100           129
7   V3    2012-03     174           177
8   V4    2012-03     127           156
9   V1    2012-04     139           196
10  V3    2012-04     194           194
11  V4    2012-04     178           178

output for a rolling window of 2:滚动窗口为 2 的输出：

   VIN Year_Month  Amount  Max_3M_Value
0   V1    2012-01     196           196
1   V2    2012-01     113           113
2   V3    2012-01     177           177
3   V1    2012-02     154           196
4   V2    2012-02     129           129
5   V4    2012-02     156           156
6   V2    2012-03     100           129
7   V3    2012-03     174           177
8   V4    2012-03     127           156
9   V1    2012-04     139           154
10  V3    2012-04     194           194
11  V4    2012-04     178           178

If really what you want is a simple max per group, you can just do:如果你真正想要的是每组一个简单的最大值，你可以这样做：

>>> df['Max_3M_Value'] = df.groupby('VIN')['Amount'].transform('max')
>>> df
   VIN Year_Month  Amount  Max_3M_Value
0   V1    2012-01     196           196
1   V2    2012-01     113           129
2   V3    2012-01     177           194
3   V1    2012-02     154           196
4   V2    2012-02     129           129
5   V4    2012-02     156           178
6   V2    2012-03     100           129
7   V3    2012-03     174           194
8   V4    2012-03     127           178
9   V1    2012-04     139           196
10  V3    2012-04     194           194
11  V4    2012-04     178           178

Answer 2

Try:尝试：

>>> df['Max_3M_Value'] = df.groupby('VIN')['Amount'] \
                           .rolling(3).max().bfill().astype(int) \
                           .reset_index(level=0, drop=True)

>>> df
   VIN Year_Month  Amount  Max_3M_Value
0   V1    2012-01     196           196
1   V2    2012-01     113           129
2   V3    2012-01     177           194
3   V1    2012-02     154           196
4   V2    2012-02     129           129
5   V4    2012-02     156           178
6   V2    2012-03     100           129
7   V3    2012-03     174           194
8   V4    2012-03     127           178
9   V1    2012-04     139           196
10  V3    2012-04     194           194
11  V4    2012-04     178           178

在 Pandas 的多列上使用 groupby 滚动最大值

问题描述

2 个解决方案

解决方案1
2 2021-07-22 08:15:23

解决方案2
0 2021-07-22 08:32:20

在 Pandas 的多列上使用 groupby 滚动最大值

问题描述

2 个解决方案

解决方案1 2 2021-07-22 08:15:23

解决方案2 0 2021-07-22 08:32:20

解决方案1
2 2021-07-22 08:15:23

解决方案2
0 2021-07-22 08:32:20