[英]How to find the time difference between two events in groups
There is a data frame which contains the following information detector id, the channel id (each detector has some channels) and the timestamp (let it be an integer for simplicity) and a number of counts that occurred in a given ( detector_id
, channel_id
) pair.有一个数据帧,其中包含以下信息检测器 id、通道 id(每个检测器都有一些通道)和时间戳(为简单起见,将其设为整数)以及在给定 (
detector_id
, channel_id
) 中发生的计数一对。
How to calculate the number of days passed since the last nonzero event in the given ( detector_id
, channel_id
) pair?如何计算
channel_id
定( detector_id
channel_id
, channel_id
)对中的最后一个非零事件以来经过的天数?
Here is an example:下面是一个例子:
df = pd.DataFrame({
"time": [1, 1, 2, 3, 3, 4, 4],
"detector_id": [0, 0, 0, 0, 0, 0, 1],
"channel_id": [0, 0, 1, 0, 1, 1, 1],
"counts": [0, 1, 0, 1, 0, 1, 0],
})
I tried to solve this in the following way:我试图通过以下方式解决这个问题:
df["diff"] = df["time"] - df.groupby(["detector_id", "channel_id"])['time'].diff()
It produces the following result:它产生以下结果:
time detector_id channel_id counts diff expected
0 1 0 1 0 NaN NaN
1 2 0 1 1 1.0 NaN
2 3 0 1 0 2.0 1.0
3 4 0 0 1 NaN NaN
4 5 0 1 0 3.0 3.0
5 6 0 1 1 5.0 4.0
6 7 1 1 0 NaN NaN
As you can see the given solution doesn't take into account counts
column.正如您所看到的,给定的解决方案没有考虑
counts
列。 We should set a difference to zero once we see counts > 0
and propagate otherwise.一旦我们看到
counts > 0
,我们应该将差异设置为零,否则传播。
This should be close, but needs testing on your full data:这应该很接近,但需要对您的完整数据进行测试:
def f(subdf):
ffilled = (subdf.loc[subdf['counts'] > 0, 'time']
.reindex_like(subdf)
.ffill()
.shift())
return subdf['time'] - ffilled
df['diff'] = (df.groupby(['detector_id', 'channel_id'])
.apply(f)
.sort_index(level=-1)
.values)
time detector_id channel_id counts diff expected
0 1 0 1 0 NaN NaN
1 2 0 1 1 NaN NaN
2 3 0 1 0 1.0 1.0
3 4 0 0 1 NaN NaN
4 5 0 1 0 3.0 3.0
5 6 0 1 1 4.0 4.0
6 7 1 1 0 NaN NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.