简体   繁体   English

不规则间隔重采样

[英]Resampling at irregular intervals

I have a regularly spaced time series stored in a pandas data frame:我有一个规则间隔的时间序列存储在熊猫数据框中:

1998-01-01 00:00:00    5.71
1998-01-01 12:00:00    5.73
1998-01-02 00:00:00    5.68
1998-01-02 12:00:00    5.69
...

I also have a list of dates that are irregularly spaced:我还有一个不规则间隔的日期列表:

1998-01-01
1998-07-05
1998-09-21
....

I would like to calculate the average of the time series between each time interval of the list of dates.我想计算日期列表的每个时间间隔之间的时间序列的平均值。 Is this somehow possible using pandas.DataFrame.resample?这在某种程度上可能使用 pandas.DataFrame.resample 吗? If not, what is the easiest way to do it?如果没有,最简单的方法是什么?

Edited: For example, calculate the mean of 'series' in between the dates in 'dates', created by the following code:编辑:例如,计算“日期”中日期之间“系列”的平均值,由以下代码创建:

import pandas as pd
import numpy as np
import datetime

rng = pd.date_range('1998-01-01', periods=365, freq='D')
series = pd.DataFrame(np.random.randn(len(rng)), index=rng)

dates = [pd.Timestamp('1998-01-01'), pd.Timestamp('1998-07-05'), pd.Timestamp('1998-09-21')]

You can loop through the dates and use select only the rows falling in between those dates like this,您可以遍历日期并仅使用选择落在这些日期之间的行,如下所示,

import pandas as pd
import numpy as np
import datetime

rng = pd.date_range('1998-01-01', periods=365, freq='D')
series = pd.DataFrame(np.random.randn(len(rng)), index=rng)

dates = [pd.Timestamp('1998-01-01'), pd.Timestamp('1998-07-05'), pd.Timestamp('1998-09-21')]

for i in range(len(dates)-1):

    start = dates[i]
    end = dates[i+1]

    sample = series.loc[(series.index > start) & (series.index <= end)]

    print(f'Mean value between {start} and {end} : {sample.mean()[0]}')

# Output
Mean value between 1998-01-01 00:00:00 and 1998-07-05 00:00:00 : -0.024342221543215112
Mean value between 1998-07-05 00:00:00 and 1998-09-21 00:00:00 : 0.13945008064765074

Instead of a loop, you can also use a list comprehension like this,除了循环,您还可以使用这样的列表理解,

print([series.loc[(series.index > dates[i]) & (series.index <= dates[i+1])].mean()[0] for i in range(len(dates) - 1) ]) # [-0.024342221543215112, 0.13945008064765074]

You could iterate over the dates like this:您可以像这样遍历日期:

for ti in range(1,len(dates)):
    start_date, end_date = dates[ti-1],dates[ti]
    mask = (series.index > start_date) & (series.index <= end_date)
    print(series[mask].mean())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM