简体   繁体   English

在pandas python中分组3个月的聚合和转换期

[英]Grouped 3 monthly aggregation and shifting periods in pandas python

The problem问题

I have a dataframe with many regions and their respective units sold, visits performed and average visit times on a monthly basis.我有一个数据框,其中包含许多地区及其各自的销售单位、执行的访问次数和每月的平均访问次数。 Not all the regions have the same starting date.并非所有地区都有相同的开始日期。

So my table looks something like this:所以我的桌子看起来像这样:

Region    Month       Visits  Average_minutes  Units_sold
Region_1  2018.01.01  12      2.22             120
Region_1  2018.02.01  10      2.02             108
Region_2  2017.04.01  4       1.8              60
Region_2  2017.05.01  4       1.6              56
Region_2  2017.06.01  3       1.5              58
Region_1  2018.03.01  11      2.1              103
Region_3  2018.04.01  3       2.22             20
Region_3  2018.05.01  2       2                22
Region_2  2017.07.01  6       1.7              61
Region_1  2018.04.01  14      2.1              125
Region_3  2018.06.01  3       2.3              21
Region_3  2018.07.01  3       2.4              19
Region_1  2018.05.01  10      2.12             116
Region_2  2017.08.01  3       2.1              55

What I would like to have is aggregate the monthly data for the different regions in 3 months frequencies by shifting one month forward.我想要的是通过向前移动一个月,以 3 个月的频率汇总不同地区的月度数据。

So if we take Region_1 for example, the end result I would like to get is something like this:所以如果我们以 Region_1 为例,我想得到的最终结果是这样的:

Region    Date        Visits  Average_minutes  Units_sold  3M_shift
Region_1  2018.01.01  33      2.11             331         0
Region_1  2018.04.01  24      2.11             241         0
Region_1  2018.02.01  35      2.07             336         1
Region_1  2018.05.01  10      2.12             116         1
Region_1  2018.02.01  35      2.07             336         2
Region_1  2018.05.01  10      2.12             116         2

As you can see the Date now contains the starting date of the 3 month frequency and in the 3M_shift column I see the shifts made compared to the first available month.如您所见,日期现在包含 3 个月频率的开始日期,并且在 3M_shift 列中,我看到与第一个可用月份相比所做的转变。

Of course in the table above you can see Region_1 only but i would like to get this result for all the groups.当然,在上表中,您只能看到 Region_1,但我想为所有组获得此结果。

More background更多背景

So I would like to have data per groups aggregated not only business year quarters but on 3 month frequency shifting by one month forwards for every iteration till I get to the last month.因此,我希望每个组的数据不仅汇总营业年度季度,而且每次迭代前 3 个月的频率向前移动一个月,直到我到达最后一个月。

My code looks like this, but this groups the months from the starting date of each region and I don't really know how to shift the starting month by one and iterate till the last month:我的代码看起来像这样,但它从每个区域的开始日期开始对月份进行分组,我真的不知道如何将开始月份移动一个并迭代到最后一个月:

grp = joined.groupby(['Region', pd.Grouper(key="Date", freq='3M')]).agg({"Visits":"sum", "Average_minutes":"mean", "Units_sold":"sum"})

So for Region_1 for example I get this result:例如,对于 Region_1,我得到了这个结果:

Region  Date          Visits  Average_minutes  Units_sold
Region_1  2018.01.01  33      2.11             331
Region_1  2018.04.01  24      2.11             241

Edit: Added a better visualisation of what I would like to get.编辑:添加了我想要得到的更好的可视化。

In the picture below you can see what I mean.在下面的图片中,您可以看到我的意思。 The green part is what I have so far.绿色部分是我到目前为止所拥有的。 I would like to make a loop for the pink part, but I do not know how to do it.我想为粉红色的部分做一个循环,但我不知道该怎么做。

在此处输入图片说明

Could you please help me to get the desired outcome?你能帮我得到想要的结果吗?

Thank you very much in advance!非常感谢您提前!

I'm not 100% sure what you are looking for, but the way I interpret, maybe this will help?我不是 100% 确定你在找什么,但我解释的方式,也许这会有所帮助?

First sort Region and Month.首先排序地区和月份。

df = df.sort_values(['Region', 'Month'])

The set a multi index.设置多索引。

df = df.set_index(['Region', 'Month'])

Then groupby the region and apply a rolling window for aggregating and shift it back two periods.然后按区域分组并应用滚动窗口进行聚合并将其移回两个时期。

df = df.groupby(level='Region').apply(lambda x: x.rolling(window=3).agg({"Visits":"sum", "Average_minutes":"mean", "Units_sold":"sum"}).shift(-2))

The result is:结果是:

                     Visits  Average_minutes  Units_sold
Region   Month                                          
Region_1 2018.01.01    33.0         2.113333       331.0
         2018.02.01    35.0         2.073333       336.0
         2018.03.01    35.0         2.106667       344.0
         2018.04.01     NaN              NaN         NaN
         2018.05.01     NaN              NaN         NaN
Region_2 2017.04.01    11.0         1.633333       174.0
         2017.05.01    13.0         1.600000       175.0
         2017.06.01    12.0         1.766667       174.0
         2017.07.01     NaN              NaN         NaN
         2017.08.01     NaN              NaN         NaN
Region_3 2018.04.01     8.0         2.173333        63.0
         2018.05.01     8.0         2.233333        62.0
         2018.06.01     NaN              NaN         NaN
         2018.07.01     NaN              NaN         NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM