简体   繁体   English

在特定时间间隔内删除行

[英]Dropping rows in a certain time interval

So we have a Pandas DataFrame with certain values at certain times.所以我们有一个 Pandas DataFrame 在特定时间具有特定值。

For example:例如:

    @ts               @value Glucose Diff   smooth_diff new P          N    C1  C2

135 2021-10-29 11:16:00 167  167.0  -3.0    15.45   15.45   17.95   17.45   NaN 0.0
155 2021-10-29 12:56:00 162  162.0  -15.0   15.35   15.35   17.95   16.00   NaN 0.0
243 2021-10-29 20:16:00 133  133.0  0.0     15.25   15.25   19.85   15.75   NaN 0.0
245 2021-10-29 20:26:00 134  134.0  0.0     15.50   15.50   15.75   15.60   NaN 0.0
113 2021-10-29 09:26:00 130  130.0  1.0     16.75   16.75   0.00    21.70   NaN NaN

Now we want to drop the rows that are in an 1 hour time interval (the @ts column) of each other (so in this example we want to drop the row at 2021-10-29 20:26:00 as it is within an 1 hour time span of the previous one), but we can't seem to figure out a way to do this.现在我们要删除彼此间隔为 1 小时(@ts 列)的行(因此在此示例中,我们要删除 2021-10-29 20:26:00 的行,因为它在前一个的 1 小时时间跨度),但我们似乎无法找到一种方法来做到这一点。

Any help?有什么帮助吗?

Something like this might work:像这样的东西可能会起作用:

import pandas as pd

# create dataframe (only creating 2 cols for ease)
df = pd.DataFrame({
    '@ts': ['2021-10-29 11:16:00', '2021-10-29 12:56:00', '2021-10-29 20:16:00', 
            '2021-10-29 20:26:00'],
    '@value': [167, 162, 133, 134]
})

# split @ts column into separate columns - date(d) and time(t)
df[["d", "t"]] = df["@ts"].str.split(" ", expand=True)

# split time column into separate parts, hours, mins and secs
df[["h", "m", "s"]] = df["t"].str.split(":", expand=True)
# drop duplicates based on date and hour, keep the first row
df = df.drop_duplicates(subset=["d", "h"], keep="first")

Convert the column to datetime .将列转换为datetime Subtract the time with the previous row time and then evaluate the total seconds .用前一行time减去time ,然后计算total seconds Calculate the abs value and check if it's greater than 3600 or not to create a boolean mask.计算abs值并检查它是否大于3600以创建 boolean 掩码。 Then, use the boolean mask to filter the required rows.然后,使用 boolean 掩码过滤所需的行。

df['@ts'] = pd.to_datetime(df['@ts'])
df = df[~(df['@ts'] - df['@ts'].shift()
          ).dt.total_seconds().fillna(np.inf).apply(abs).lt(3600)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM