将每个第一个匹配行中的列值设置为 0

Question

I'm attempting to detect sequences of time where the difference between timestamps is below some threshold.我试图检测时间戳之间的差异低于某个阈值的时间序列。 To perform this I transform a list of timestamps into seconds and measure the difference between each time stamp.为了执行此操作，我将时间戳列表转换为秒并测量每个时间戳之间的差异。 I've written code to perform this but the issue is when I measure differences between days the time difference value should be zero for the first row in each day.我已经编写了代码来执行此操作，但问题是当我测量天之间的差异时，每天第一行的时间差异值应该为零。 So the value 86390 in below dataframe should be 0. This is just a contrived example.所以下面数据帧中的值86390应该是 0。这只是一个人为的例子。 For multiple groups how to set the first entry in each group to 0 ?对于多个组如何将每个组中的第一个条目设置为 0 ？

Code :代码：

import pandas as pd

arr= []
df = pd.DataFrame(
    {'date': ['2019-01-01 00:02:48.714000' , '2019-01-01 00:02:58.714000' , '2019-01-02 00:02:48.714000' , '2019-01-02 00:04:48.714000'],
     'id': [1 , 2 , 3 , 4],

    })
df['date'] = pd.to_datetime(df['date'])

for d in df['date'] : 
    arr.append(d.timestamp())

df.sort_values(by=['date'])
df['TIME_IN_SEC'] = arr
df['TIME_IN_SEC_SHIFT'] = df.TIME_IN_SEC.shift(1)
df['TIME_DIFF'] = df["TIME_IN_SEC"] - df["TIME_IN_SEC_SHIFT"]

list_values = []

for g in df.groupby(pd.Grouper(key='date',freq='D')) : 
    list_values.append(sum(g[1]['TIME_DIFF']) / len(g[1]))

df

renders :呈现：

Answer 1

To set the first row of each day to zero, you can group by the date column but extract the actual date value, then aggregate to the 'first' row.要将每天的第一行设置为零，您可以按日期列分组但提取实际日期值，然后聚合到“第一”行。 Create a series from this for the 'id' column.以此为 'id' 列创建一个系列。 (I'm assuming they are unique values.) （我假设它们是唯一的值。）

id_filt = df.groupby(df.date.dt.date).first()['id']

Then use loc to return only rows with the 'id' values, then set the columns equal to zero.然后使用 loc 仅返回具有 'id' 值的行，然后将列设置为零。

df.loc[df["id"].isin(id_filt.values), ["TIME_IN_SEC_SHIFT", "TIME_DIFF"]] = 0

                     date  id   TIME_IN_SEC  TIME_IN_SEC_SHIFT  TIME_DIFF
0 2019-01-01 00:02:48.714   1  1.546301e+09       0.000000e+00        0.0
1 2019-01-01 00:02:58.714   2  1.546301e+09       1.546301e+09       10.0
2 2019-01-02 00:02:48.714   3  1.546387e+09       0.000000e+00        0.0
3 2019-01-02 00:04:48.714   4  1.546387e+09       1.546387e+09      120.0

Of course you could combine these together to get:当然，您可以将这些组合在一起得到：

df.loc[
    df["id"].isin(df.groupby(df.date.dt.date).first()["id"].values),
    ["TIME_IN_SEC_SHIFT", "TIME_DIFF"],
] = 0

将每个第一个匹配行中的列值设置为 0

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-01-30 01:39:51

将每个第一个匹配行中的列值设置为 0

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-01-30 01:39:51

解决方案1
1 已采纳 2020-01-30 01:39:51