简体   繁体   English

在 Pandas 的多列上使用 groupby 滚动最大值

[英]Rolling max value with groupby on multiple columns in pandas

I am having a dataframe on which I want to compute maximum of rolling previous 3 month's value.我有一个数据框,我想在其中计算前 3 个月滚动值的最大值。 Below is the dataframe:下面是数据框:

VIN   Year_Month    Amount  
  V1    2012-01      196     
  V2    2012-01      113     
  V3    2012-01      177     
  V1    2012-02      154     
  V2    2012-02      129     
  V4    2012-02      156     
  V2    2012-03      100     
  V3    2012-03      174     
  V4    2012-03      127     
  V1    2012-04      139    
  V3    2012-04      194     
  V4    2012-04      178     

Using the following piece of code I am trying to compute the max value for each VIN in last 3 month使用以下代码,我试图计算过去 3 个月内每个 VIN最大值

df['Max_3M_Value'] = df.groupby(['VIN','Year_Month'])['Amount'].rolling(3).max().fillna(0)

But the above code is giving error:但是上面的代码给出了错误:

TypeError: incompatible index of inserted column with frame index

The resultant dataframe I am looking is :我正在寻找的结果数据框是:

  VIN  Year_Month   Amount  Max_3M_Value
   V1   2012-01      196     196
   V2   2012-01      113     129
   V3   2012-01      177     194
   V1   2012-02      154     196
   V2   2012-02      129     129
   V4   2012-02      156     178
   V2   2012-03      100     129
   V3   2012-03      174     194
   V4   2012-03      127     178
   V1   2012-04      139     196
   V3   2012-04      194     194
   V4   2012-04      178     178

I have to compute the max value using rolling only .I know we can solve this using pd.pivot_table() , but that is not needed.我必须仅使用rolling来计算最大值。我知道我们可以使用pd.pivot_table()解决这个问题,但这不是必需的。

What am I missing here.我在这里错过了什么。

You can avoid the error by using only the returned values, but you have to make sure that the order of the rows does not change during the operation:您可以通过仅使用返回值来避免错误,但您必须确保在操作期间行的顺序不会改变:

df['Max_3M_Value'] = df.groupby(['VIN','Year_Month'])['Amount'].rolling(3).max().fillna(0).values

In your given example, your groups are too small (1 element each) to be able to apply a rolling with a window of 3:在您给定的示例中,您的组太小(每个组 1 个元素),无法应用窗口为 3 的滚动:

>>> df.groupby(['VIN','Year_Month'])['Amount'].count()

VIN  Year_Month
V1   2012-01       1
     2012-02       1
     2012-04       1
V2   2012-01       1
     2012-02       1
     2012-03       1
V3   2012-01       1
     2012-03       1
     2012-04       1
V4   2012-02       1
     2012-03       1
     2012-04       1

I believe, what you really want is to group only by 'VIN':我相信,您真正想要的是仅按“VIN”分组:

df['Max_3M_Value'] = df.groupby('VIN')['Amount'].transform(lambda s: s.rolling(3, min_periods=1).max())

output:输出:

   VIN Year_Month  Amount  Max_3M_Value
0   V1    2012-01     196           196
1   V2    2012-01     113           113
2   V3    2012-01     177           177
3   V1    2012-02     154           196
4   V2    2012-02     129           129
5   V4    2012-02     156           156
6   V2    2012-03     100           129
7   V3    2012-03     174           177
8   V4    2012-03     127           156
9   V1    2012-04     139           196
10  V3    2012-04     194           194
11  V4    2012-04     178           178

output for a rolling window of 2:滚动窗口为 2 的输出:

   VIN Year_Month  Amount  Max_3M_Value
0   V1    2012-01     196           196
1   V2    2012-01     113           113
2   V3    2012-01     177           177
3   V1    2012-02     154           196
4   V2    2012-02     129           129
5   V4    2012-02     156           156
6   V2    2012-03     100           129
7   V3    2012-03     174           177
8   V4    2012-03     127           156
9   V1    2012-04     139           154
10  V3    2012-04     194           194
11  V4    2012-04     178           178

If really what you want is a simple max per group, you can just do:如果你真正想要的是每组一个简单的最大值,你可以这样做:

>>> df['Max_3M_Value'] = df.groupby('VIN')['Amount'].transform('max')
>>> df
   VIN Year_Month  Amount  Max_3M_Value
0   V1    2012-01     196           196
1   V2    2012-01     113           129
2   V3    2012-01     177           194
3   V1    2012-02     154           196
4   V2    2012-02     129           129
5   V4    2012-02     156           178
6   V2    2012-03     100           129
7   V3    2012-03     174           194
8   V4    2012-03     127           178
9   V1    2012-04     139           196
10  V3    2012-04     194           194
11  V4    2012-04     178           178

Try:尝试:

>>> df['Max_3M_Value'] = df.groupby('VIN')['Amount'] \
                           .rolling(3).max().bfill().astype(int) \
                           .reset_index(level=0, drop=True)
>>> df
   VIN Year_Month  Amount  Max_3M_Value
0   V1    2012-01     196           196
1   V2    2012-01     113           129
2   V3    2012-01     177           194
3   V1    2012-02     154           196
4   V2    2012-02     129           129
5   V4    2012-02     156           178
6   V2    2012-03     100           129
7   V3    2012-03     174           194
8   V4    2012-03     127           178
9   V1    2012-04     139           196
10  V3    2012-04     194           194
11  V4    2012-04     178           178

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM