[英]How to get daily difference in time series values when time delta index is irregular in pandas?
I have a dataframe containing a time series indexed by time but with irregular time deltas as below我有一个 dataframe 包含按时间索引但不规则时间增量的时间序列,如下所示
df
time x
2018-08-18 17:45:08 1.4562
2018-08-18 17:46:55 1.4901
2018-08-18 17:51:21 1.8012
...
2020-03-21 04:17:19 0.7623
2020-03-21 05:01:02 0.8231
2020-03-21 05:02:34 0.8038
What I want to do is get the daily difference between the two (chronologically) closest values , ie the closest time the next day.我想要做的是获取两个(按时间顺序)最接近的值之间的每日差异,即第二天最接近的时间。 For example, if we have a sample at time 2018-08-18 17:45:08, and the next day we do not have a sample at the same time, but the closest sample is at, say, 2018-08-19 17:44:29, then I want to get the difference in
x
between these two times.例如,如果我们在 2018 年 8 月 18 日 17:45:08 有一个样本,而第二天我们同时没有样本,但最接近的样本是在 2018 年 8 月 19 日17:44:29,然后我想得到这两次之间的
x
差。 How is that possible in pandas? pandas 怎么可能?
n
rows will be NaN
given how the difference is taken, where n
is the number of samples in the first dayn
行将是NaN
给定如何获取差异,其中n
是第一天的样本数EDIT: The code below works if the time deltas are regular编辑:如果时间增量是常规的,则下面的代码有效
def get_daily_diff(data):
"""
Calculate daily difference in time series
Args:
data (pandas.Series): a pandas series of time series values indexed by pandas.Timestamp
Returns:
pandas.Series: daily difference in values
"""
df0 = data.index.searchsorted(data.index - pd.Timedelta(days=1))
df0 = df0[df0 > 0]
df0 = pd.Series(data.index[df0 - 1], index=data.index[data.shape[0] - df0.shape[0]:])
out = data.loc[df0.index] - data.loc[df0.values]
return out
However, if using irregular time delats, a ValueError
is thrown when defining the variable out
as we get a length mismatch between data.loc[df0.index]
and data.loc[df0.values]
.但是,如果使用不规则的时间延迟,则在定义变量时会
out
ValueError
,因为我们得到data.loc[df0.index]
和data.loc[df0.values]
之间的长度不匹配。 So the issue is to expand this function to work when the time deltas are irregular.所以问题是扩展这个 function 以在时间增量不规则时工作。
I would use pd.merge_asof
with direction='nearest'
:我会使用
pd.merge_asof
和direction='nearest'
:
df['time_1d'] = df['time']+pd.Timedelta('1D')
tmp = pd.merge_asof(df, df, left_on='time', right_on ='time_1d',
direction='nearest', tolerance=pd.Timedelta('12H'), suffixes=('', '_y'))
tmp['delta'] = tmp['x_y'] - tmp['x']
tmp = tmp[['time', 'x', 'delta']]
Here I have used a tolerance of 12H to make sure to have NaN for first days but you could use a more appropriate value.在这里,我使用了 12H 的容差来确保第一天有 NaN,但您可以使用更合适的值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.