如何找到组中两个事件之间的时间差

Question

There is a data frame which contains the following information detector id, the channel id (each detector has some channels) and the timestamp (let it be an integer for simplicity) and a number of counts that occurred in a given ( detector_id , channel_id ) pair.有一个数据帧，其中包含以下信息检测器 id、通道 id（每个检测器都有一些通道）和时间戳（为简单起见，将其设为整数）以及在给定 ( detector_id , channel_id ) 中发生的计数一对。

How to calculate the number of days passed since the last nonzero event in the given ( detector_id , channel_id ) pair?如何计算channel_id定（ detector_id channel_id ， channel_id ）对中的最后一个非零事件以来经过的天数？

Here is an example:下面是一个例子：

df = pd.DataFrame({
    "time": [1, 1, 2, 3, 3, 4, 4],
    "detector_id": [0, 0, 0, 0, 0, 0, 1],
    "channel_id": [0, 0, 1, 0, 1, 1, 1],
    "counts": [0, 1, 0, 1, 0, 1, 0],
})

I tried to solve this in the following way:我试图通过以下方式解决这个问题：

df["diff"] = df["time"] - df.groupby(["detector_id", "channel_id"])['time'].diff()

It produces the following result:它产生以下结果：

   time  detector_id  channel_id  counts  diff  expected
0     1            0           1       0   NaN       NaN
1     2            0           1       1   1.0       NaN
2     3            0           1       0   2.0       1.0
3     4            0           0       1   NaN       NaN
4     5            0           1       0   3.0       3.0
5     6            0           1       1   5.0       4.0
6     7            1           1       0   NaN       NaN

As you can see the given solution doesn't take into account counts column.正如您所看到的，给定的解决方案没有考虑counts列。 We should set a difference to zero once we see counts > 0 and propagate otherwise.一旦我们看到counts > 0 ，我们应该将差异设置为零，否则传播。

Answer 1

This should be close, but needs testing on your full data:这应该很接近，但需要对您的完整数据进行测试：

def f(subdf):      
    ffilled = (subdf.loc[subdf['counts'] > 0, 'time']
               .reindex_like(subdf)
               .ffill()
               .shift())
    return subdf['time'] - ffilled

df['diff'] = (df.groupby(['detector_id', 'channel_id'])
                .apply(f)
                .sort_index(level=-1)
                .values)

   time  detector_id  channel_id  counts  diff  expected
0     1            0           1       0   NaN       NaN
1     2            0           1       1   NaN       NaN
2     3            0           1       0   1.0       1.0
3     4            0           0       1   NaN       NaN
4     5            0           1       0   3.0       3.0
5     6            0           1       1   4.0       4.0
6     7            1           1       0   NaN       NaN

如何找到组中两个事件之间的时间差

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-06-16 14:06:36

如何找到组中两个事件之间的时间差

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-06-16 14:06:36

解决方案1
2 已采纳 2019-06-16 14:06:36