简体   繁体   English

当 pandas 中的时间增量索引不规则时,如何获取时间序列值的每日差异?

[英]How to get daily difference in time series values when time delta index is irregular in pandas?

I have a dataframe containing a time series indexed by time but with irregular time deltas as below我有一个 dataframe 包含按时间索引但不规则时间增量的时间序列,如下所示

df
time                  x
2018-08-18 17:45:08   1.4562
2018-08-18 17:46:55   1.4901
2018-08-18 17:51:21   1.8012
...
2020-03-21 04:17:19   0.7623
2020-03-21 05:01:02   0.8231
2020-03-21 05:02:34   0.8038

What I want to do is get the daily difference between the two (chronologically) closest values , ie the closest time the next day.我想要做的是获取两个(按时间顺序)最接近的值之间的每日差异,即第二天最接近的时间。 For example, if we have a sample at time 2018-08-18 17:45:08, and the next day we do not have a sample at the same time, but the closest sample is at, say, 2018-08-19 17:44:29, then I want to get the difference in x between these two times.例如,如果我们在 2018 年 8 月 18 日 17:45:08 有一个样本,而第二天我们同时没有样本,但最接近的样本是在 2018 年 8 月 19 日17:44:29,然后我想得到这两次之间的x差。 How is that possible in pandas? pandas 怎么可能?

  • There will always be a sample for every single day between first day and last day in the time series.在时间序列的第一天和最后一天之间的每一天都会有一个样本。
  • The difference should be taken as (current x) - (past x) eg x_day2 - x_day1差值应取为 (current x) - (past x) 例如 x_day2 - x_day1
  • The output's first n rows will be NaN given how the difference is taken, where n is the number of samples in the first day输出的前n行将是NaN给定如何获取差异,其中n是第一天的样本数

EDIT: The code below works if the time deltas are regular编辑:如果时间增量是常规的,则下面的代码有效

def get_daily_diff(data):
    """
    Calculate daily difference in time series

    Args:
        data (pandas.Series): a pandas series of time series values indexed by pandas.Timestamp

    Returns:
        pandas.Series: daily difference in values
    """
    df0 = data.index.searchsorted(data.index - pd.Timedelta(days=1))
    df0 = df0[df0 > 0]
    df0 = pd.Series(data.index[df0 - 1], index=data.index[data.shape[0] - df0.shape[0]:])
    out = data.loc[df0.index] - data.loc[df0.values]
    return out

However, if using irregular time delats, a ValueError is thrown when defining the variable out as we get a length mismatch between data.loc[df0.index] and data.loc[df0.values] .但是,如果使用不规则的时间延迟,则在定义变量时会out ValueError ,因为我们得到data.loc[df0.index]data.loc[df0.values]之间的长度不匹配。 So the issue is to expand this function to work when the time deltas are irregular.所以问题是扩展这个 function 以在时间增量不规则时工作。

I would use pd.merge_asof with direction='nearest' :我会使用pd.merge_asofdirection='nearest'

df['time_1d'] = df['time']+pd.Timedelta('1D')
tmp = pd.merge_asof(df, df, left_on='time', right_on ='time_1d',
           direction='nearest', tolerance=pd.Timedelta('12H'), suffixes=('', '_y'))
tmp['delta'] = tmp['x_y'] - tmp['x']
tmp = tmp[['time', 'x', 'delta']]

Here I have used a tolerance of 12H to make sure to have NaN for first days but you could use a more appropriate value.在这里,我使用了 12H 的容差来确保第一天有 NaN,但您可以使用更合适的值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM