简体   繁体   English

在 pandas 中向上取整半小时

[英]Round up half of the hour in pandas

round() function in pandas rounds down the time 07:30 to 07:00 But I want to round up any time which passes the 30 minutes (inclusive). pandas 中的round() function 将时间从 07:30 向下舍入到 07:00 但我想在超过 30 分钟(含)的任何时间进行舍入。

Eg.例如。

07:15 to 07:00
05:25 to 05:00
22:30 to 23:00
18:45 to 19:00

How to achieve this for a column of a dataframe using pandas?如何使用 pandas 为 dataframe 的列实现此目的?

timestamps时间戳

You need to use dt.round .您需要使用dt.round This is however a bit as the previous/next hour behavior depends on the hour itself.然而,这有点因为前一小时/下一小时的行为取决于小时本身。 You can force it by adding or subtracting a small amount of time (here 1ns):您可以通过增加或减少少量时间(此处为 1ns)来强制执行它:

s = pd.to_datetime(pd.Series(['1/2/2021 3:45', '25/4/2021 12:30', 
                              '25/4/2021 13:30', '12/4/2022 23:45']))

# xx:30 -> rounding depending on the hour parity (default)
s.dt.round(freq='1h')

0   2021-01-02 04:00:00
1   2021-04-25 12:00:00    <- -30min
2   2021-04-25 14:00:00    <- +30min
3   2022-12-05 00:00:00
dtype: datetime64[ns]


# 00:30 -> 00:00 (force down)
s.sub(pd.Timedelta('1ns')).dt.round(freq='1h')

0   2021-01-02 04:00:00
1   2021-04-25 12:00:00
2   2021-04-25 13:00:00
3   2022-12-05 00:00:00
dtype: datetime64[ns]


# 00:30 -> 01:00 (force up)
s.add(pd.Timedelta('1ns')).dt.round(freq='1h')

0   2021-01-02 04:00:00
1   2021-04-25 12:00:00
2   2021-04-25 13:00:00
3   2022-12-05 00:00:00
dtype: datetime64[ns]

floats漂浮

IIUC, you can use divmod (or numpy.modf ) to get the integer and decimal part, then perform simple boolean arithmetic: IIUC,你可以使用divmod (或numpy.modf )得到 integer 和小数部分,然后执行简单的 boolean 算术:

s = pd.Series([7.15, 5.25, 22.30, 18.45])

s2, r = s.divmod(1)  # or np.modf(s)

s2[r.ge(0.3)] += 1

s2 = s2.astype(int)

Alternative: using mod and boolean to int equivalence:备选方案:使用mod和 boolean 到 int 等价:

s2 = s.astype(int)+s.mod(1).ge(0.3)

output: output:

0     7
1     5
2    23
3    19
dtype: int64

Note on precision.注意精度。 It is not always easy to compare floats due to floating point arithmetics.由于浮点运算,比较浮点数并不总是那么容易。 For instance using gt would fail on the 22.30 here.例如,在 22.30 此处使用gt会失败。 To ensure precision round to 2 digits first.为确保精度先舍入到 2 位数字。

s.mod(1).round(2).ge(0.3)

or use integers:或使用整数:

s.mod(1).mul(100).astype(int).ge(30)

Here a version that works with timestamps:这是一个使用时间戳的版本:

#dummy data:
df = pd.DataFrame({'time':pd.to_datetime([np.random.randint(0,10**8) for a in range(10)], unit='s')})


def custom_round(df, col, out):
    if df[col].minute >= 30:
        df[out] = df[col].ceil('H')
    else:
        df[out] = df[col].floor('H')
    return df


df.apply(lambda x: custom_round(x, 'time', 'new_time'), axis=1)

#edit: #编辑:

using numpy:使用 numpy:

def custom_round(df, col, out):
    df[out] = np.where(
        (
            df['time'].dt.minute>=30), 
            df[col].dt.ceil('H'), 
            df[col].dt.floor('H')
    )
    return df
df = custom_round(df, 'time', 'new_time')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM