简体   繁体   English

熊猫如何使用一年多的数据每月重新采样

[英]Pandas how to resample monthly using data from more than a year

I have a dataframe that is a time series of 7 years of data.我有一个数据框,它是 7 年数据的时间序列。 I have an index that is a timestamp and a column (let's call it sales ) for every store .我有一个索引,它是每个storetimestamp和列(我们称之为sales )。 Each store has its own time series of sales .每个商店都有自己的sales时间序列。

I am trying to resample and sum all data to a monthly visualization like so:我正在尝试对所有数据进行重新采样并将其汇总到每月的可视化中,如下所示:

df = df.groupby('store').resample('M').sum()

This indeed groups data by month, but it takes into account the year.这确实按月对数据进行分组,但它考虑了年份。 Ie, it treats 'December 2010' like a different month from 'December 2011'.即,它将“2010 年 12 月”视为与“2011 年 12 月”不同的月份。 I ended up having 7 * 12 rows instead of only 12 rows.我最终有7 * 12行而不是只有12行。

I'd like to sum all months of the 7 years and group them in 12 months of sales.我想总结这 7 年的所有月份,并将它们归为 12 个月的销售。

Minimal reproducible example最小的可重现示例

index = pd.date_range('1/1/2000', periods=730, freq='D') #2 years of daily data
series = pd.Series(range(730), index=index) #just dummy data
series # would return a index with 730 values (2 years)

series.resample('M').sum() #this returns 24 values, representing each month, which doesn't work for me.

Thanks谢谢

You'll probably want to use a df and add a column by applying a function to the date to get just the month.您可能希望使用 df 并通过将函数应用于日期来添加一列以获取月份。 You can probably also do this by apply that function within the groupby, but I'm not sure how that would work and this methodology will get you the results you want您也可以通过在 groupby 中应用该功能来做到这一点,但我不确定这将如何工作,这种方法将为您提供您想要的结果

import pandas as pd

dates = pd.date_range('1/1/2000', periods=730, freq='D') #2 years of daily data
series = pd.Series(range(730)) #just dummy data

# make temp df
d = {'date': dates, 'temp': series}
df = pd.DataFrame(d)

# add col just for month
df['month_num'] = df.apply(lambda row: str(row['date']).split('-')[1], axis=1)

print(df)

# get sum for each month
print(df.groupby('month_num')['temp'].sum())

df generated: df 生成:

          date  temp month_num
0   2000-01-01     0        01
1   2000-01-02     1        01
2   2000-01-03     2        01
3   2000-01-04     3        01
4   2000-01-05     4        01
..         ...   ...       ...
725 2001-12-26   725        12
726 2001-12-27   726        12
727 2001-12-28   727        12
728 2001-12-29   728        12
729 2001-12-30   729        12
[730 rows x 3 columns]

output of the groupby month_num sum(): groupby month_num sum() 的输出:

month_num
01    12276
02    12799
03    15965
04    17280
05    19747
06    20940
07    23529
08    25451
09    26460
10    29233
11    30120
12    32285
Name: temp, dtype: int64

Try this, using the month attribute of pd.DatetimeIndex :试试这个,使用pd.DatetimeIndexmonth属性:

series.groupby(series.index.month).sum()

Output:输出:

1     12276
2     12799
3     15965
4     17280
5     19747
6     20940
7     23529
8     25451
9     26460
10    29233
11    30120
12    32285
dtype: int64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM