简体   繁体   English

将每个第一个匹配行中的列值设置为 0

[英]Set column value in each first matched row to 0

I'm attempting to detect sequences of time where the difference between timestamps is below some threshold.我试图检测时间戳之间的差异低于某个阈值的时间序列。 To perform this I transform a list of timestamps into seconds and measure the difference between each time stamp.为了执行此操作,我将时间戳列表转换为秒并测量每个时间戳之间的差异。 I've written code to perform this but the issue is when I measure differences between days the time difference value should be zero for the first row in each day.我已经编写了代码来执行此操作,但问题是当我测量天之间的差异时,每天第一行的时间差异值应该为零。 So the value 86390 in below dataframe should be 0. This is just a contrived example.所以下面数据帧中的值86390应该是 0。这只是一个人为的例子。 For multiple groups how to set the first entry in each group to 0 ?对于多个组如何将每个组中的第一个条目设置为 0 ?

Code :代码 :

import pandas as pd

arr= []
df = pd.DataFrame(
    {'date': ['2019-01-01 00:02:48.714000' , '2019-01-01 00:02:58.714000' , '2019-01-02 00:02:48.714000' , '2019-01-02 00:04:48.714000'],
     'id': [1 , 2 , 3 , 4],

    })
df['date'] = pd.to_datetime(df['date'])

for d in df['date'] : 
    arr.append(d.timestamp())

df.sort_values(by=['date'])
df['TIME_IN_SEC'] = arr
df['TIME_IN_SEC_SHIFT'] = df.TIME_IN_SEC.shift(1)
df['TIME_DIFF'] = df["TIME_IN_SEC"] - df["TIME_IN_SEC_SHIFT"]

list_values = []

for g in df.groupby(pd.Grouper(key='date',freq='D')) : 
    list_values.append(sum(g[1]['TIME_DIFF']) / len(g[1]))

df

renders :呈现:

在此处输入图片说明

To set the first row of each day to zero, you can group by the date column but extract the actual date value, then aggregate to the 'first' row.要将每天的第一行设置为零,您可以按日期列分组但提取实际日期值,然后聚合到“第一”行。 Create a series from this for the 'id' column.以此为 'id' 列创建一个系列。 (I'm assuming they are unique values.) (我假设它们是唯一的值。)

id_filt = df.groupby(df.date.dt.date).first()['id']

Then use loc to return only rows with the 'id' values, then set the columns equal to zero.然后使用 loc 仅返回具有 'id' 值的行,然后将列设置为零。

df.loc[df["id"].isin(id_filt.values), ["TIME_IN_SEC_SHIFT", "TIME_DIFF"]] = 0

                     date  id   TIME_IN_SEC  TIME_IN_SEC_SHIFT  TIME_DIFF
0 2019-01-01 00:02:48.714   1  1.546301e+09       0.000000e+00        0.0
1 2019-01-01 00:02:58.714   2  1.546301e+09       1.546301e+09       10.0
2 2019-01-02 00:02:48.714   3  1.546387e+09       0.000000e+00        0.0
3 2019-01-02 00:04:48.714   4  1.546387e+09       1.546387e+09      120.0

Of course you could combine these together to get:当然,您可以将这些组合在一起得到:

df.loc[
    df["id"].isin(df.groupby(df.date.dt.date).first()["id"].values),
    ["TIME_IN_SEC_SHIFT", "TIME_DIFF"],
] = 0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM