[英]How to resample a Time Series on given irregular dates
import pandas as pd
date_index = pd.date_range("2010-01-31", "2010-12-31", freq="M")
df = pd.Series(range(12), index=date_index)
dates = date_index[1::2]
The Series df
is of monthly frequency, and we want to resample by adding up the value between the dates as given by the dates
variable. Series df
具有每月频率,我们希望通过将dates
变量给出的日期之间的值相加来重新采样。
df
is: df
是:
2010-01-31 0
2010-02-28 1
2010-03-31 2
2010-04-30 3
2010-05-31 4
2010-06-30 5
2010-07-31 6
2010-08-31 7
2010-09-30 8
2010-10-31 9
2010-11-30 10
2010-12-31 11
Freq: M, dtype: int64
dates
is dates
是
DatetimeIndex(['2010-02-28', '2010-04-30', '2010-06-30', '2010-08-31',
'2010-10-31', '2010-12-31'],
dtype='datetime64[ns]', freq='2M')
The expected result should be:预期的结果应该是:
2010-02-28 1
2010-04-30 5
2010-06-30 9
2010-08-31 13
2010-10-31 17
2010-12-31 21
Not a general resampling solution but for your concrete question of adding up the values between the dates you could use不是一般的重采样解决方案,而是针对您可以使用的日期之间的值相加的具体问题
res = df.cumsum()[dates].diff()
res[0] = df[dates[0]]
res = res.astype(df.dtype)
Result:结果:
2010-02-28 1
2010-04-30 5
2010-06-30 9
2010-08-31 13
2010-10-31 17
2010-12-31 21
Idea is replace not matched values of date
s to missing values by Series.where
with bfill
for bacj filling missing values and then aggregate sum
:想法是用bfill
将date
s 的不匹配值替换为缺失值,用Series.where
填充缺失值,然后聚合sum
:
date_index = pd.date_range("2010-01-31", "2010-12-31", freq="M")
s = pd.Series(range(12), index=date_index)
dates = date_index[1::2]
a = s.index.to_series().where(s.index.isin(dates)).bfill()
out = s.groupby(a).sum()
print(out)
2010-02-28 1
2010-04-30 5
2010-06-30 9
2010-08-31 13
2010-10-31 17
2010-12-31 21
dtype: int64
For your specific example, where df[0] = 0
, it is a simple resample
with sum()
aggregation, skipping df[0]
.对于您的具体示例,其中df[0] = 0
,它是一个简单的带有sum()
聚合的resample
,跳过df[0]
。
df_resampled = df[1::].resample('2M').sum()
print(df_resampled)
2010-02-28 1
2010-04-30 5
2010-06-30 9
2010-08-31 13
2010-10-31 17
2010-12-31 21
Freq: 2M, dtype: int64
In case df[0] != 0
, you can still make an easy workaround by adding df[0]
to the first element of df_resampled
:如果df[0] != 0
,您仍然可以通过将df[0]
添加到df_resampled
的第一个元素来轻松解决:
df_resampled[0] = df_resampled[0] + df[0]
In case you want general resampling with period of two month, you can try to use the loffset
parameter of resample
and provide a function returning pd.Timedelta
objects such, that it "floors" to the last day of each individual month.如果您想要以两个月为周期进行一般重采样,您可以尝试使用resample
的loffset
参数并提供返回pd.Timedelta
对象的 function ,使其“落地”到每个月的最后一天。 (See here for how to get montly periods for pd.Timedelta
) (请参阅此处了解如何获取pd.Timedelta
的每月周期)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.