简体   繁体   English

Pandas 将每月数据重新采样为自定义频率(季节性)数据

[英]Pandas resample monthly data into custom frequency (seasonal) data

Background背景

I have a monthly dataset and want to resample it to seasonal by adding monthly data.我有一个月度数据集,并希望通过添加月度数据将其重新采样为季节性数据。

Seasonal refers to:
(Dec,Jan,Feb), (Mar,Apr,May),(June,July,Aug,Sep),(Oct,Nov)

The Data数据

dti = pd.date_range("2015-12-31", periods=11, freq="M")
df = pd.DataFrame({'time':dti,
                  'data':np.random.rand(len(dti))})

Output:
        time    data
0   2015-12-31  0.466245
1   2016-01-31  0.959309
2   2016-02-29  0.445139
3   2016-03-31  0.575556
4   2016-04-30  0.303020
5   2016-05-31  0.591516
6   2016-06-30  0.001410
7   2016-07-31  0.338360
8   2016-08-31  0.540705
9   2016-09-30  0.115278
10  2016-10-31  0.950359

Code代码

So, I was able to do resample for other seasons except Dec, Jan, Feb (DJF).因此,除了 12 月、1 月、2 月 (DJF) 之外,我能够对其他季节进行重新采样。 Here is what I have done for other seasons:这是我为其他季节所做的:

MAM = df.loc[df['time'].dt.month.between(3,5)].resample('Y',on='time').sum()

Since, for DJF I couldn't use between , I used a conditional statement.因为对于 DJF 我不能使用between ,所以我使用了条件语句。

mask = (df['time'].dt.month>11) | (df['time'].dt.month<=2)
DJF = df.loc[mask].resample('3M',origin='start',on='time').sum()

The Issue问题

This resampling leaves my first data '2015-12-31' as it is and starts from the '2016' even though I used origin = 'start' .即使我使用了origin = 'start' ,此重采样仍保留我的第一个数据 '2015-12-31' 并从 '2016' 开始。 So, my questions are basically:所以,我的问题基本上是:

  1. How do I solve my resampling issue?如何解决我的重采样问题?
  2. I feel like there must be a more straight forward and easier way to do this rather than conditional statements.我觉得必须有一种更直接、更简单的方法来做到这一点,而不是条件语句。 Also, Is there anything similar to using df['time'].month.between but for index.另外,是否有任何类似于使用df['time'].month.between但用于索引。 I tried using df.index.month.between but between doesn't work for int64 datetime object.我尝试使用df.index.month.between但对于 int64 日期时间 object 不起作用。 I found repetitively using df.set_index and df.reset_index quite tiresome.我发现重复使用df.set_indexdf.reset_index很烦人。

Try mapping each month value to a season value then groupby resample on each season:尝试将每个月的值映射到一个季节值,然后对每个季节进行groupby resample

df['season'] = df['time'].dt.month.map({
    12: 0, 1: 0, 2: 0,
    3: 1, 4: 1, 5: 1,
    6: 2, 7: 2, 8: 2, 9: 2,
    10: 3, 11: 3
})

df = df.groupby('season').resample('Y', on='time')['data'].sum().reset_index()

df : df

   season       time      data
0       0 2015-12-31  0.221993
1       0 2016-12-31  1.077451
2       1 2016-12-31  2.018766
3       2 2016-12-31  1.768848
4       3 2016-12-31  0.080741

To consider the previous December as part of the next year add MonthBegin from pandas.tseries.offsets to offset December 2015 to January 2016, then adjust all Season values forward one month:要将上一个 12 月视为下一年的一部分,请从pandas.tseries.offsets MonthBegin抵消 2015 年 12 月至 2016 年 1 月,然后将所有季节值向前调整一个月:

df['time'] = df['time'] + MonthBegin(1)
df['season'] = df['time'].dt.month.map({
    1: 0, 2: 0, 3: 0,
    4: 1, 5: 1, 6: 1,
    7: 2, 8: 2, 9: 2, 10: 2,
    11: 3, 12: 3
})

df = df.groupby('season').resample('Y', on='time')['data'].sum().reset_index()

df : df

   season       time      data
0       0 2016-12-31  1.299445
1       1 2016-12-31  2.018766
2       2 2016-12-31  1.768848
3       3 2016-12-31  0.080741

Sample Data Used:使用的样本数据:

np.random.seed(5)
dti = pd.date_range("2015-12-31", periods=11, freq="M")
df = pd.DataFrame({'time': dti,
                   'data': np.random.rand(len(dti))})

df : df

         time      data
0  2015-12-31  0.221993
1  2016-01-31  0.870732
2  2016-02-29  0.206719
3  2016-03-31  0.918611
4  2016-04-30  0.488411
5  2016-05-31  0.611744
6  2016-06-30  0.765908
7  2016-07-31  0.518418
8  2016-08-31  0.296801
9  2016-09-30  0.187721
10 2016-10-31  0.080741

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM