简体   繁体   English

如何找到组中两个事件之间的时间差

[英]How to find the time difference between two events in groups

There is a data frame which contains the following information detector id, the channel id (each detector has some channels) and the timestamp (let it be an integer for simplicity) and a number of counts that occurred in a given ( detector_id , channel_id ) pair.有一个数据帧,其中包含以下信息检测器 id、通道 id(每个检测器都有一些通道)和时间戳(为简单起见,将其设为整数)以及在给定 ( detector_id , channel_id ) 中发生的计数一对。

How to calculate the number of days passed since the last nonzero event in the given ( detector_id , channel_id ) pair?如何计算channel_id定( detector_id channel_idchannel_id )对中的最后一个非零事件以来经过的天数?

Here is an example:下面是一个例子:

df = pd.DataFrame({
    "time": [1, 1, 2, 3, 3, 4, 4],
    "detector_id": [0, 0, 0, 0, 0, 0, 1],
    "channel_id": [0, 0, 1, 0, 1, 1, 1],
    "counts": [0, 1, 0, 1, 0, 1, 0],
})

I tried to solve this in the following way:我试图通过以下方式解决这个问题:

df["diff"] = df["time"] - df.groupby(["detector_id", "channel_id"])['time'].diff()

It produces the following result:它产生以下结果:

   time  detector_id  channel_id  counts  diff  expected
0     1            0           1       0   NaN       NaN
1     2            0           1       1   1.0       NaN
2     3            0           1       0   2.0       1.0
3     4            0           0       1   NaN       NaN
4     5            0           1       0   3.0       3.0
5     6            0           1       1   5.0       4.0
6     7            1           1       0   NaN       NaN

As you can see the given solution doesn't take into account counts column.正如您所看到的,给定的解决方案没有考虑counts列。 We should set a difference to zero once we see counts > 0 and propagate otherwise.一旦我们看到counts > 0 ,我们应该将差异设置为零,否则传播。

This should be close, but needs testing on your full data:这应该很接近,但需要对您的完整数据进行测试:

def f(subdf):      
    ffilled = (subdf.loc[subdf['counts'] > 0, 'time']
               .reindex_like(subdf)
               .ffill()
               .shift())
    return subdf['time'] - ffilled

df['diff'] = (df.groupby(['detector_id', 'channel_id'])
                .apply(f)
                .sort_index(level=-1)
                .values)

   time  detector_id  channel_id  counts  diff  expected
0     1            0           1       0   NaN       NaN
1     2            0           1       1   NaN       NaN
2     3            0           1       0   1.0       1.0
3     4            0           0       1   NaN       NaN
4     5            0           1       0   3.0       3.0
5     6            0           1       1   4.0       4.0
6     7            1           1       0   NaN       NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM