[英]Pandas how to resample monthly using data from more than a year
I have a dataframe that is a time series of 7 years of data.我有一个数据框,它是 7 年数据的时间序列。 I have an index that is a timestamp
and a column (let's call it sales
) for every store
.我有一个索引,它是每个store
的timestamp
和列(我们称之为sales
)。 Each store has its own time series of sales
.每个商店都有自己的sales
时间序列。
I am trying to resample and sum all data to a monthly visualization like so:我正在尝试对所有数据进行重新采样并将其汇总到每月的可视化中,如下所示:
df = df.groupby('store').resample('M').sum()
This indeed groups data by month, but it takes into account the year.这确实按月对数据进行分组,但它考虑了年份。 Ie, it treats 'December 2010' like a different month from 'December 2011'.即,它将“2010 年 12 月”视为与“2011 年 12 月”不同的月份。 I ended up having 7 * 12
rows instead of only 12
rows.我最终有7 * 12
行而不是只有12
行。
I'd like to sum all months of the 7 years and group them in 12 months of sales.我想总结这 7 年的所有月份,并将它们归为 12 个月的销售。
Minimal reproducible example最小的可重现示例
index = pd.date_range('1/1/2000', periods=730, freq='D') #2 years of daily data
series = pd.Series(range(730), index=index) #just dummy data
series # would return a index with 730 values (2 years)
series.resample('M').sum() #this returns 24 values, representing each month, which doesn't work for me.
Thanks谢谢
You'll probably want to use a df and add a column by applying a function to the date to get just the month.您可能希望使用 df 并通过将函数应用于日期来添加一列以获取月份。 You can probably also do this by apply that function within the groupby, but I'm not sure how that would work and this methodology will get you the results you want您也可以通过在 groupby 中应用该功能来做到这一点,但我不确定这将如何工作,这种方法将为您提供您想要的结果
import pandas as pd
dates = pd.date_range('1/1/2000', periods=730, freq='D') #2 years of daily data
series = pd.Series(range(730)) #just dummy data
# make temp df
d = {'date': dates, 'temp': series}
df = pd.DataFrame(d)
# add col just for month
df['month_num'] = df.apply(lambda row: str(row['date']).split('-')[1], axis=1)
print(df)
# get sum for each month
print(df.groupby('month_num')['temp'].sum())
df generated: df 生成:
date temp month_num
0 2000-01-01 0 01
1 2000-01-02 1 01
2 2000-01-03 2 01
3 2000-01-04 3 01
4 2000-01-05 4 01
.. ... ... ...
725 2001-12-26 725 12
726 2001-12-27 726 12
727 2001-12-28 727 12
728 2001-12-29 728 12
729 2001-12-30 729 12
[730 rows x 3 columns]
output of the groupby month_num sum(): groupby month_num sum() 的输出:
month_num
01 12276
02 12799
03 15965
04 17280
05 19747
06 20940
07 23529
08 25451
09 26460
10 29233
11 30120
12 32285
Name: temp, dtype: int64
Try this, using the month
attribute of pd.DatetimeIndex
:试试这个,使用pd.DatetimeIndex
的month
属性:
series.groupby(series.index.month).sum()
Output:输出:
1 12276
2 12799
3 15965
4 17280
5 19747
6 20940
7 23529
8 25451
9 26460
10 29233
11 30120
12 32285
dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.