当 pandas 中的时间增量索引不规则时，如何获取时间序列值的每日差异？

Question

I have a dataframe containing a time series indexed by time but with irregular time deltas as below我有一个 dataframe 包含按时间索引但不规则时间增量的时间序列，如下所示

df
time                  x
2018-08-18 17:45:08   1.4562
2018-08-18 17:46:55   1.4901
2018-08-18 17:51:21   1.8012
...
2020-03-21 04:17:19   0.7623
2020-03-21 05:01:02   0.8231
2020-03-21 05:02:34   0.8038

What I want to do is get the daily difference between the two (chronologically) closest values , ie the closest time the next day.我想要做的是获取两个（按时间顺序）最接近的值之间的每日差异，即第二天最接近的时间。 For example, if we have a sample at time 2018-08-18 17:45:08, and the next day we do not have a sample at the same time, but the closest sample is at, say, 2018-08-19 17:44:29, then I want to get the difference in x between these two times.例如，如果我们在 2018 年 8 月 18 日 17:45:08 有一个样本，而第二天我们同时没有样本，但最接近的样本是在 2018 年 8 月 19 日17:44:29，然后我想得到这两次之间的x差。 How is that possible in pandas? pandas 怎么可能？

There will always be a sample for every single day between first day and last day in the time series.在时间序列的第一天和最后一天之间的每一天都会有一个样本。
The difference should be taken as (current x) - (past x) eg x_day2 - x_day1差值应取为 (current x) - (past x) 例如 x_day2 - x_day1
The output's first n rows will be NaN given how the difference is taken, where n is the number of samples in the first day输出的前n行将是NaN给定如何获取差异，其中n是第一天的样本数

EDIT: The code below works if the time deltas are regular编辑：如果时间增量是常规的，则下面的代码有效

def get_daily_diff(data):
    """
    Calculate daily difference in time series

    Args:
        data (pandas.Series): a pandas series of time series values indexed by pandas.Timestamp

    Returns:
        pandas.Series: daily difference in values
    """
    df0 = data.index.searchsorted(data.index - pd.Timedelta(days=1))
    df0 = df0[df0 > 0]
    df0 = pd.Series(data.index[df0 - 1], index=data.index[data.shape[0] - df0.shape[0]:])
    out = data.loc[df0.index] - data.loc[df0.values]
    return out

However, if using irregular time delats, a ValueError is thrown when defining the variable out as we get a length mismatch between data.loc[df0.index] and data.loc[df0.values] .但是，如果使用不规则的时间延迟，则在定义变量时会out ValueError ，因为我们得到data.loc[df0.index]和data.loc[df0.values]之间的长度不匹配。 So the issue is to expand this function to work when the time deltas are irregular.所以问题是扩展这个 function 以在时间增量不规则时工作。

Answer 1

I would use pd.merge_asof with direction='nearest' :我会使用pd.merge_asof和direction='nearest' ：

df['time_1d'] = df['time']+pd.Timedelta('1D')
tmp = pd.merge_asof(df, df, left_on='time', right_on ='time_1d',
           direction='nearest', tolerance=pd.Timedelta('12H'), suffixes=('', '_y'))
tmp['delta'] = tmp['x_y'] - tmp['x']
tmp = tmp[['time', 'x', 'delta']]

Here I have used a tolerance of 12H to make sure to have NaN for first days but you could use a more appropriate value.在这里，我使用了 12H 的容差来确保第一天有 NaN，但您可以使用更合适的值。

当 pandas 中的时间增量索引不规则时，如何获取时间序列值的每日差异？

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-05-26 09:52:13

当 pandas 中的时间增量索引不规则时，如何获取时间序列值的每日差异？

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-05-26 09:52:13

解决方案1
1 已采纳 2020-05-26 09:52:13