如何在给定的不规则日期重新采样时间序列

Question

import pandas as pd
date_index = pd.date_range("2010-01-31", "2010-12-31", freq="M")
df  = pd.Series(range(12), index=date_index)

dates = date_index[1::2]

The Series df is of monthly frequency, and we want to resample by adding up the value between the dates as given by the dates variable. Series df具有每月频率，我们希望通过将dates变量给出的日期之间的值相加来重新采样。

df is: df是：

2010-01-31     0
2010-02-28     1
2010-03-31     2
2010-04-30     3
2010-05-31     4
2010-06-30     5
2010-07-31     6
2010-08-31     7
2010-09-30     8
2010-10-31     9
2010-11-30    10
2010-12-31    11
Freq: M, dtype: int64

dates is dates是

DatetimeIndex(['2010-02-28', '2010-04-30', '2010-06-30', '2010-08-31',
               '2010-10-31', '2010-12-31'],
              dtype='datetime64[ns]', freq='2M')

The expected result should be:预期的结果应该是：

2010-02-28     1
2010-04-30     5
2010-06-30     9
2010-08-31     13
2010-10-31     17
2010-12-31    21

Answer 1

Not a general resampling solution but for your concrete question of adding up the values between the dates you could use不是一般的重采样解决方案，而是针对您可以使用的日期之间的值相加的具体问题

res = df.cumsum()[dates].diff()
res[0] = df[dates[0]]
res = res.astype(df.dtype)

Result:结果：

2010-02-28     1
2010-04-30     5
2010-06-30     9
2010-08-31    13
2010-10-31    17
2010-12-31    21

Answer 2

Idea is replace not matched values of date s to missing values by Series.where with bfill for bacj filling missing values and then aggregate sum :想法是用bfill将date s 的不匹配值替换为缺失值，用Series.where填充缺失值，然后聚合sum ：

date_index = pd.date_range("2010-01-31", "2010-12-31", freq="M")
s  = pd.Series(range(12), index=date_index)

dates = date_index[1::2]

a = s.index.to_series().where(s.index.isin(dates)).bfill()
out = s.groupby(a).sum()
print(out)
2010-02-28     1
2010-04-30     5
2010-06-30     9
2010-08-31    13
2010-10-31    17
2010-12-31    21
dtype: int64

Answer 3

For your specific example, where df[0] = 0 , it is a simple resample with sum() aggregation, skipping df[0] .对于您的具体示例，其中df[0] = 0 ，它是一个简单的带有sum()聚合的resample ，跳过df[0] 。

df_resampled = df[1::].resample('2M').sum()

print(df_resampled)
2010-02-28     1
2010-04-30     5
2010-06-30     9
2010-08-31    13
2010-10-31    17
2010-12-31    21
Freq: 2M, dtype: int64

In case df[0] != 0 , you can still make an easy workaround by adding df[0] to the first element of df_resampled :如果df[0] != 0 ，您仍然可以通过将df[0]添加到df_resampled的第一个元素来轻松解决：

df_resampled[0] = df_resampled[0] + df[0]

In case you want general resampling with period of two month, you can try to use the loffset parameter of resample and provide a function returning pd.Timedelta objects such, that it "floors" to the last day of each individual month.如果您想要以两个月为周期进行一般重采样，您可以尝试使用resample的loffset参数并提供返回pd.Timedelta对象的 function ，使其“落地”到每个月的最后一天。 (See here for how to get montly periods for pd.Timedelta ) （请参阅此处了解如何获取pd.Timedelta的每月周期）

如何在给定的不规则日期重新采样时间序列

问题描述

3 个解决方案

解决方案1
2 2020-06-19 11:02:12

解决方案2
1 已采纳 2020-06-19 11:16:25

解决方案3
1 2020-06-19 11:42:16

如何在给定的不规则日期重新采样时间序列

问题描述

3 个解决方案

解决方案1 2 2020-06-19 11:02:12

解决方案2 1 已采纳 2020-06-19 11:16:25

解决方案3 1 2020-06-19 11:42:16

解决方案1
2 2020-06-19 11:02:12

解决方案2
1 已采纳 2020-06-19 11:16:25

解决方案3
1 2020-06-19 11:42:16