[英]Rolling max value with groupby on multiple columns in pandas
I am having a dataframe on which I want to compute maximum of rolling previous 3 month's value.我有一个数据框,我想在其中计算前 3 个月滚动值的最大值。 Below is the dataframe:
下面是数据框:
VIN Year_Month Amount
V1 2012-01 196
V2 2012-01 113
V3 2012-01 177
V1 2012-02 154
V2 2012-02 129
V4 2012-02 156
V2 2012-03 100
V3 2012-03 174
V4 2012-03 127
V1 2012-04 139
V3 2012-04 194
V4 2012-04 178
Using the following piece of code I am trying to compute the max value for each VIN in last 3 month使用以下代码,我试图计算过去 3 个月内每个 VIN的最大值
df['Max_3M_Value'] = df.groupby(['VIN','Year_Month'])['Amount'].rolling(3).max().fillna(0)
But the above code is giving error:但是上面的代码给出了错误:
TypeError: incompatible index of inserted column with frame index
The resultant dataframe I am looking is :我正在寻找的结果数据框是:
VIN Year_Month Amount Max_3M_Value
V1 2012-01 196 196
V2 2012-01 113 129
V3 2012-01 177 194
V1 2012-02 154 196
V2 2012-02 129 129
V4 2012-02 156 178
V2 2012-03 100 129
V3 2012-03 174 194
V4 2012-03 127 178
V1 2012-04 139 196
V3 2012-04 194 194
V4 2012-04 178 178
I have to compute the max value using rolling
only .I know we can solve this using pd.pivot_table()
, but that is not needed.我必须仅使用
rolling
来计算最大值。我知道我们可以使用pd.pivot_table()
解决这个问题,但这不是必需的。
What am I missing here.我在这里错过了什么。
You can avoid the error by using only the returned values, but you have to make sure that the order of the rows does not change during the operation:您可以通过仅使用返回值来避免错误,但您必须确保在操作期间行的顺序不会改变:
df['Max_3M_Value'] = df.groupby(['VIN','Year_Month'])['Amount'].rolling(3).max().fillna(0).values
In your given example, your groups are too small (1 element each) to be able to apply a rolling with a window of 3:在您给定的示例中,您的组太小(每个组 1 个元素),无法应用窗口为 3 的滚动:
>>> df.groupby(['VIN','Year_Month'])['Amount'].count()
VIN Year_Month
V1 2012-01 1
2012-02 1
2012-04 1
V2 2012-01 1
2012-02 1
2012-03 1
V3 2012-01 1
2012-03 1
2012-04 1
V4 2012-02 1
2012-03 1
2012-04 1
I believe, what you really want is to group only by 'VIN':我相信,您真正想要的是仅按“VIN”分组:
df['Max_3M_Value'] = df.groupby('VIN')['Amount'].transform(lambda s: s.rolling(3, min_periods=1).max())
output:输出:
VIN Year_Month Amount Max_3M_Value
0 V1 2012-01 196 196
1 V2 2012-01 113 113
2 V3 2012-01 177 177
3 V1 2012-02 154 196
4 V2 2012-02 129 129
5 V4 2012-02 156 156
6 V2 2012-03 100 129
7 V3 2012-03 174 177
8 V4 2012-03 127 156
9 V1 2012-04 139 196
10 V3 2012-04 194 194
11 V4 2012-04 178 178
output for a rolling window of 2:滚动窗口为 2 的输出:
VIN Year_Month Amount Max_3M_Value
0 V1 2012-01 196 196
1 V2 2012-01 113 113
2 V3 2012-01 177 177
3 V1 2012-02 154 196
4 V2 2012-02 129 129
5 V4 2012-02 156 156
6 V2 2012-03 100 129
7 V3 2012-03 174 177
8 V4 2012-03 127 156
9 V1 2012-04 139 154
10 V3 2012-04 194 194
11 V4 2012-04 178 178
If really what you want is a simple max per group, you can just do:如果你真正想要的是每组一个简单的最大值,你可以这样做:
>>> df['Max_3M_Value'] = df.groupby('VIN')['Amount'].transform('max')
>>> df
VIN Year_Month Amount Max_3M_Value
0 V1 2012-01 196 196
1 V2 2012-01 113 129
2 V3 2012-01 177 194
3 V1 2012-02 154 196
4 V2 2012-02 129 129
5 V4 2012-02 156 178
6 V2 2012-03 100 129
7 V3 2012-03 174 194
8 V4 2012-03 127 178
9 V1 2012-04 139 196
10 V3 2012-04 194 194
11 V4 2012-04 178 178
Try:尝试:
>>> df['Max_3M_Value'] = df.groupby('VIN')['Amount'] \
.rolling(3).max().bfill().astype(int) \
.reset_index(level=0, drop=True)
>>> df
VIN Year_Month Amount Max_3M_Value
0 V1 2012-01 196 196
1 V2 2012-01 113 129
2 V3 2012-01 177 194
3 V1 2012-02 154 196
4 V2 2012-02 129 129
5 V4 2012-02 156 178
6 V2 2012-03 100 129
7 V3 2012-03 174 194
8 V4 2012-03 127 178
9 V1 2012-04 139 196
10 V3 2012-04 194 194
11 V4 2012-04 178 178
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.