如何根据与上一行的差异对行进行分组？

Question

I have the following dataframe : 我有以下数据框：

    | start_time          | end_time            | id  |
    |---------------------|---------------------|-----|
    | 2017-03-30 01:00:00 | 2017-03-30 01:15:30 |1    |
    | 2017-03-30 02:02:00 | 2017-03-30 03:30:00 |4    |
    | 2017-03-30 03:37:00 | 2017-03-30 03:39:00 |7    |
    | 2017-03-30 03:41:30 | 2017-03-30 04:50:00 |8    |
    | 2017-03-30 07:10:00 | 2017-03-30 07:10:30 |10   |
    | 2017-03-30 07:11:00 | 2017-03-30 07:20:00 |13   |
    | 2017-03-30 07:22:00 | 2017-03-30 08:00:00 |15   |
    | 2017-03-30 10:00:00 | 2017-03-30 10:03:00 |20   |

I would like to group rows under the same id when time_finish of row "i-1" is at most 900 seconds before time_start of row "i". 当“ i-1”行的time_finish在“ i”行的time_start之前最多900秒时，我想将同一行的分组。
Basically, the output for the example above would be : The result would be : 基本上，以上示例的输出为：结果为：

    | start_time          | end_time            | id  |
    |---------------------|---------------------|-----|
    | 2017-03-30 01:00:00 | 2017-03-30 01:15:30 |1    |
    | 2017-03-30 02:02:00 | 2017-03-30 03:30:00 |4    |
    | 2017-03-30 03:37:00 | 2017-03-30 03:39:00 |4    |
    | 2017-03-30 03:41:30 | 2017-03-30 04:50:00 |4    |
    | 2017-03-30 07:10:00 | 2017-03-30 07:10:30 |10   |
    | 2017-03-30 07:11:00 | 2017-03-30 07:20:00 |10   |
    | 2017-03-30 07:22:00 | 2017-03-30 08:00:00 |10   |
    | 2017-03-30 10:00:00 | 2017-03-30 10:03:00 |20   |

I achieved it through the following code but I'm sure there's a more elegant (and efficient) way to do so : 我是通过以下代码实现的，但是我敢肯定有一种更优雅（更有效）的方法：

df['endTime_delayed'] = df.end_time.shift(1)
df['id_delayed'] = df['id'].shift(1)
for (i,row) in df.iterrows():
    if (row.start_time-row.endTime_delayed).seconds <= 900 :
        df.id.iloc[i] = df.id_delayed.iloc[i]
        try :
            df.id_delayed.iloc[i+1] = df.id.iloc[i]
        except : 
            break

Answer 1

`mask` and `ffill` `mask`和`ffill`

diff = df.start_time.sub(df.end_time.shift())
mask = diff < pd.Timedelta(900, unit='s')
df.id.mask(mask).ffill().astype(df.id.dtype)

0     1
1     4
2     4
3     4
4    10
5    10
6    10
7    20
Name: id, dtype: int64

如何根据与上一行的差异对行进行分组？

问题描述

1 个解决方案

解决方案1
4 已采纳 2019-07-26 20:22:52

`mask` and `ffill` `mask`和`ffill`

如何根据与上一行的差异对行进行分组？

问题描述

1 个解决方案

解决方案1 4 已采纳 2019-07-26 20:22:52

mask and ffill mask和ffill

解决方案1
4 已采纳 2019-07-26 20:22:52

`mask` and `ffill` `mask`和`ffill`