[英]aggregate data by quarter
I have a pivot pandas data frame (sales by region) that got created from another pandas data frame (sales by store) using the pivot_table method. 我有一个枢轴熊猫数据框(按区域划分的销售额),它是通过使用pivot_table方法从另一个熊猫数据框(按商店划分的销售额)创建的。
As an example: 举个例子:
df = pd.DataFrame(
{'store':['A','B','C','D','E']*7,
'region':['NW','NW','SW','NE','NE']*7,
'date':['2017-03-30']*5+['2017-04-05']*5+['2017-04-07']*5+['2017-04-12']*5+['2017-04-13']*5+['2017-04-17']*5+['2017-04-20']*5,
'sales':[30,1,133,9,1,30,3,135,9,11,30,1,140,15,15,25,10,137,9,3,29,10,137,9,11,30,19,145,20,10,30,8,141,25,25]
})
df['date'] = pd.to_datetime(df['date'])
df_sales = df.pivot_table(index = ['region'], columns = ['date'], aggfunc = [np.sum], margins = True)
df_sales = df_sales.ix[:,range(0, df_sales.shape[1]-1)]
My goal is to do the following to the sales data frame, df_sales. 我的目标是对销售数据框df_sales执行以下操作。
Create a new dataframe that summarizes sales by quarter. 创建一个新的数据框,按季度汇总销售额。 I could use the original dataframe df, or the sales_df.
我可以使用原始数据框df或sales_df。
As of quarter here we only have only two quarters ( USA fiscal calendar year ) so the quarterly aggregated data frame would look like: 截至本季度末,我们只有两个季度( 美国会计年度 ),因此季度汇总数据框架如下所示:
2017Q1 2017Q2
10 27
31 37.5
133 139.17
I take the average for all days in Q1, and same for Q2. 我将第一季度的所有天均值作为平均值,并将第二季度的均值作为平均值。 Thus, for example for the North east region,
'NE'
, the Q1 is the average of only one day 2017-03-30, ie, 10, and for the Q2 is the average across 2017-04-05 to 2017-04-20, ie, 因此,例如,对于东北地区
'NE'
,第一季度是2017-03-30一天的平均值,即10天,第二季度是2017-04-05至2017-04年的平均值-20,即
(20+30+12+20+30+50)/6=27
Any suggestions? 有什么建议么?
ADDITIONAL NOTE: I would ideally do the quarter aggregations on the df_sales pivoted table since it's a much smaller dataframe to keep in memory. 其他说明:理想情况下,我会在df_sales数据透视表上进行四分之一聚合,因为它要保留在内存中的数据帧要小得多。 The current solution does it on the original df, but I am still seeking a way to do it in the df_sales dataframe.
当前的解决方案是在原始df上完成的,但我仍在df_sales数据帧中寻求解决方案。
UPDATE: 更新:
Setup: 设定:
df.date = pd.to_datetime(df.date)
df_sales = df.pivot_table(index='region', columns='date', values='sales', aggfunc='sum')
In [318]: df_sales
Out[318]:
date 2017-03-30 2017-04-05 2017-04-07 2017-04-12 2017-04-13 2017-04-17 2017-04-20
region
NE 10 20 30 12 20 30 50
NW 31 33 31 35 39 49 38
SW 133 135 140 137 137 145 141
Solution: 解:
In [319]: (df_sales.groupby(pd.PeriodIndex(df_sales.columns, freq='Q'), axis=1)
...: .apply(lambda x: x.sum(axis=1)/x.shape[1])
...: )
Out[319]:
date 2017Q1 2017Q2
region
NE 10.0 27.000000
NW 31.0 37.500000
SW 133.0 139.166667
Solution based on the original DF: 基于原始DF的解决方案:
In [253]: (df.groupby(['region', pd.PeriodIndex(df.date, freq='Q-DEC')])
...: .apply(lambda x: x['sales'].sum()/x['date'].nunique())
...: .to_frame('avg').unstack('date')
...: )
...:
Out[253]:
avg
date 2017Q1 2017Q2
region
NE 10.0 27.000000
NW 31.0 37.500000
SW 133.0 139.166667
NOTE: df
- is the original DF (before "pivoting") 注意:
df
是原始DF(“旋转”之前)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.