简体   繁体   English

上采样时如何处理熊猫重新采样中的时间序列结束?

[英]How to handle end of time series in pandas resample when upsampling?

I want to resample from hours to half-hours.我想从几个小时重新采样到半小时。 I use .ffill() in the example, but I've tested .asfreq() as an intermediate step too.我在示例中使用了.ffill() ,但我也将.asfreq()作为中间步骤进行了测试。

The goal is to get intervals of half hours where the hourly values are spread among the upsampled intervals, and I'm trying to find a general solution for any ranges with the same problem.目标是获得半小时的间隔,其中每小时的值分布在上采样的间隔中,我正在尝试为具有相同问题的任何范围找到通用解决方案。

import pandas as pd

index = pd.date_range('2018-10-10 00:00', '2018-10-10 02:00', freq='H')
hourly = pd.Series(range(10, len(index)+10), index=index)
half_hourly = hourly.resample('30min').ffill() / 2

The hourly series looks like: hourly系列看起来像:

2018-10-10 00:00:00    10
2018-10-10 01:00:00    11
2018-10-10 02:00:00    12
Freq: H, dtype: int64

And the half_hourly :half_hourly

2018-10-10 00:00:00    5.0
2018-10-10 00:30:00    5.0
2018-10-10 01:00:00    5.5
2018-10-10 01:30:00    5.5
2018-10-10 02:00:00    6.0
Freq: 30T, dtype: float64

The problem with the last one is that there is no row for representing 02:30:00最后一个的问题是没有代表02:30:00

I want to achieve something that is:我想实现以下目标:

2018-10-10 00:00:00    5.0
2018-10-10 00:30:00    5.0
2018-10-10 01:00:00    5.5
2018-10-10 01:30:00    5.5
2018-10-10 02:00:00    6.0
2018-10-10 02:30:00    6.0
Freq: 30T, dtype: float64

I understand that the hourly series ends at 02:00, so there is no reason to expect pandas to insert the last half hour by default.我知道hourly系列在 02:00 结束,所以没有理由期望 Pandas 默认插入最后半小时。 However, after reading a lot of deprecated/old posts, some newer ones, the documentation , and cookbook , I still weren't able to find a straight-forward solution.但是,在阅读了许多已弃用/旧帖子、一些较新的帖子、 文档食谱后,我仍然无法找到直接的解决方案。

Lastly, I've also tested the use of .mean() , but that didn't fill the NaNs . And interpolate()最后,我还测试了.mean()的使用,但这并没有填充 NaNs . And interpolate() . And interpolate() didn't average by hour as I wanted it to. . And interpolate()并没有像我想要的那样按小时平均。

My .ffill() / 2 almost works as a way to spread hour to half hours in this case, but it seems like a hack to a problem that I expect pandas already provides a better solution to.在这种情况下,我的.ffill() / 2几乎可以作为一种将小时.ffill() / 2到半小时的方法,但这似乎是对我希望熊猫已经提供更好解决方案的问题的一种破解。

Thanks in advance.提前致谢。

Your precise issue can be solved like this您的确切问题可以这样解决

>>> import pandas as pd
>>> index = pd.date_range('2018-10-10 00:00', '2018-10-10 02:00', freq='H')
>>> hourly = pd.Series(range(10, len(index)+10), index=index)
>>> hourly.reindex(index.union(index.shift(freq='30min'))).ffill() / 2
2018-10-10 00:00:00    5.0
2018-10-10 00:30:00    5.0
2018-10-10 01:00:00    5.5
2018-10-10 01:30:00    5.5
2018-10-10 02:00:00    6.0
2018-10-10 02:30:00    6.0
Freq: 30T, dtype: float64

>>> import pandas as pd
>>> index = pd.date_range('2018-10-10 00:00', '2018-10-10 02:00', freq='H')
>>> hourly = pd.Series(range(10, len(index)+10), index=index)
>>> hourly.reindex(index.union(index.shift(freq='30min'))).ffill() / 2

I suspect that this is a minimal example so I will try to generically solve as well.我怀疑这是一个最小的例子,所以我也会尝试一般解决。 Lets say you have multiple points to fill in each day假设您每天有多个点要填写

>>> import pandas as pd
>>> x = pd.Series([1.5, 2.5], pd.DatetimeIndex(['2018-09-21', '2018-09-22']))
>>> x.resample('6h').ffill()
2018-09-21 00:00:00    1.5
2018-09-21 06:00:00    1.5
2018-09-21 12:00:00    1.5
2018-09-21 18:00:00    1.5
2018-09-22 00:00:00    2.5
Freq: 6H, dtype: float64

Employ a similar trick to include 6am, 12pm, 6pm on 2018-09-22 as well.在 2018-09-22 也使用类似的技巧包括早上 6 点、中午 12 点、下午 6 点。

Re-index with a shift equal to that you want to have as an inclusive endpoint.使用等于您希望作为包含端点的移位重新索引。 In this case our shift is an extra day在这种情况下,我们的班次是额外的一天

>>> import pandas as pd
>>> x = pd.Series([1.5, 2.5], pd.DatetimeIndex(['2018-09-21', '2018-09-22']))
>>> res = x.reindex(x.index.union(x.index.shift(freq='1D'))).resample('6h').ffill()
>>> res[:res.last_valid_index()]  # drop the start of next day
2018-09-21 00:00:00    1.5
2018-09-21 06:00:00    1.5
2018-09-21 12:00:00    1.5
2018-09-21 18:00:00    1.5
2018-09-22 00:00:00    2.5
2018-09-22 06:00:00    2.5
2018-09-22 12:00:00    2.5
2018-09-22 18:00:00    2.5
Freq: 6H, dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM