[英]Pandas resample monthly data into custom frequency (seasonal) data
I have a monthly dataset and want to resample it to seasonal by adding monthly data.我有一个月度数据集,并希望通过添加月度数据将其重新采样为季节性数据。
Seasonal refers to:
(Dec,Jan,Feb), (Mar,Apr,May),(June,July,Aug,Sep),(Oct,Nov)
dti = pd.date_range("2015-12-31", periods=11, freq="M")
df = pd.DataFrame({'time':dti,
'data':np.random.rand(len(dti))})
Output:
time data
0 2015-12-31 0.466245
1 2016-01-31 0.959309
2 2016-02-29 0.445139
3 2016-03-31 0.575556
4 2016-04-30 0.303020
5 2016-05-31 0.591516
6 2016-06-30 0.001410
7 2016-07-31 0.338360
8 2016-08-31 0.540705
9 2016-09-30 0.115278
10 2016-10-31 0.950359
So, I was able to do resample for other seasons except Dec, Jan, Feb (DJF).因此,除了 12 月、1 月、2 月 (DJF) 之外,我能够对其他季节进行重新采样。 Here is what I have done for other seasons:
这是我为其他季节所做的:
MAM = df.loc[df['time'].dt.month.between(3,5)].resample('Y',on='time').sum()
Since, for DJF I couldn't use between
, I used a conditional statement.因为对于 DJF 我不能使用
between
,所以我使用了条件语句。
mask = (df['time'].dt.month>11) | (df['time'].dt.month<=2)
DJF = df.loc[mask].resample('3M',origin='start',on='time').sum()
This resampling leaves my first data '2015-12-31' as it is and starts from the '2016' even though I used origin = 'start'
.即使我使用了
origin = 'start'
,此重采样仍保留我的第一个数据 '2015-12-31' 并从 '2016' 开始。 So, my questions are basically:所以,我的问题基本上是:
df['time'].month.between
but for index.df['time'].month.between
但用于索引。 I tried using df.index.month.between
but between doesn't work for int64 datetime object.df.index.month.between
但对于 int64 日期时间 object 不起作用。 I found repetitively using df.set_index
and df.reset_index
quite tiresome.df.set_index
和df.reset_index
很烦人。Try mapping each month value to a season value then groupby resample
on each season:尝试将每个月的值映射到一个季节值,然后对每个季节进行
groupby resample
:
df['season'] = df['time'].dt.month.map({
12: 0, 1: 0, 2: 0,
3: 1, 4: 1, 5: 1,
6: 2, 7: 2, 8: 2, 9: 2,
10: 3, 11: 3
})
df = df.groupby('season').resample('Y', on='time')['data'].sum().reset_index()
df
: df
:
season time data
0 0 2015-12-31 0.221993
1 0 2016-12-31 1.077451
2 1 2016-12-31 2.018766
3 2 2016-12-31 1.768848
4 3 2016-12-31 0.080741
To consider the previous December as part of the next year add MonthBegin
from pandas.tseries.offsets
to offset December 2015 to January 2016, then adjust all Season values forward one month:要将上一个 12 月视为下一年的一部分,请从
pandas.tseries.offsets
MonthBegin
抵消 2015 年 12 月至 2016 年 1 月,然后将所有季节值向前调整一个月:
df['time'] = df['time'] + MonthBegin(1)
df['season'] = df['time'].dt.month.map({
1: 0, 2: 0, 3: 0,
4: 1, 5: 1, 6: 1,
7: 2, 8: 2, 9: 2, 10: 2,
11: 3, 12: 3
})
df = df.groupby('season').resample('Y', on='time')['data'].sum().reset_index()
df
: df
:
season time data
0 0 2016-12-31 1.299445
1 1 2016-12-31 2.018766
2 2 2016-12-31 1.768848
3 3 2016-12-31 0.080741
Sample Data Used:使用的样本数据:
np.random.seed(5)
dti = pd.date_range("2015-12-31", periods=11, freq="M")
df = pd.DataFrame({'time': dti,
'data': np.random.rand(len(dti))})
df
: df
:
time data
0 2015-12-31 0.221993
1 2016-01-31 0.870732
2 2016-02-29 0.206719
3 2016-03-31 0.918611
4 2016-04-30 0.488411
5 2016-05-31 0.611744
6 2016-06-30 0.765908
7 2016-07-31 0.518418
8 2016-08-31 0.296801
9 2016-09-30 0.187721
10 2016-10-31 0.080741
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.