简体   繁体   中英

Pandas change time values based on condition

I have a dataframe:

data = {'time':['08:45:00', '09:30:00', '18:00:00', '15:00:00']}
df = pd.DataFrame(data)

I would like to convert the time based on conditions: if the hour is less than 9, I want to set it to 9 and if the hour is more than 17, I need to set it to 17.

I tried this approach:

df['time'] = np.where(((df['time'].dt.hour < 9) & (df['time'].dt.hour != 0)), dt.time(9, 00))

I am getting an error: Can only use .dt. accesor with datetimelike values.

Can anyone please help me with this? Thanks.

Here's a way to do what your question asks:

df.time = pd.to_datetime(df.time)
df.loc[df.time.dt.hour < 9, 'time'] = (df.time.astype('int64') + (9 - df.time.dt.hour)*3600*1000000000).astype('datetime64[ns]')
df.loc[df.time.dt.hour > 17, 'time'] = (df.time.astype('int64') + (17 - df.time.dt.hour)*3600*1000000000).astype('datetime64[ns]')

Input:

                 time
0 2022-06-06 08:45:00
1 2022-06-06 09:30:00
2 2022-06-06 18:00:00
3 2022-06-06 15:00:00

Output:

                 time
0 2022-06-06 09:45:00
1 2022-06-06 09:30:00
2 2022-06-06 17:00:00
3 2022-06-06 15:00:00

UPDATE:

Here's alternative code to try to address OP's error as described in the comments:

import pandas as pd
import datetime

data = {'time':['08:45:00', '09:30:00', '18:00:00', '15:00:00']}
df = pd.DataFrame(data)
print('', 'df loaded as strings:', df, sep='\n')

df.time = pd.to_datetime(df.time, format='%H:%M:%S')
print('', 'df converted to datetime by pd.to_datetime():', df, sep='\n')
df.loc[df.time.dt.hour < 9, 'time'] = (df.time.astype('int64') + (9 - df.time.dt.hour)*3600*1000000000).astype('datetime64[ns]')
df.loc[df.time.dt.hour > 17, 'time'] = (df.time.astype('int64') + (17 - df.time.dt.hour)*3600*1000000000).astype('datetime64[ns]')
df.time = [time.time() for time in pd.to_datetime(df.time)]
print('', 'df with time column adjusted to have hour between 9 and 17, converted to type "time":', df, sep='\n')

Output:


df loaded as strings:
       time
0  08:45:00
1  09:30:00
2  18:00:00
3  15:00:00

df converted to datetime by pd.to_datetime():
                 time
0 1900-01-01 08:45:00
1 1900-01-01 09:30:00
2 1900-01-01 18:00:00
3 1900-01-01 15:00:00

df with time column adjusted to have hour between 9 and 17, converted to type "time":
       time
0  09:45:00
1  09:30:00
2  17:00:00
3  15:00:00

UPDATE #2 :

To not just change the hour for out-of-window times, but to simply apply 9:00 and 17:00 as min and max times, respectively (see OP's comment on this), you can do this:

df.loc[df['time'].dt.hour < 9, 'time'] = pd.to_datetime(pd.DataFrame({
    'year':df['time'].dt.year, 'month':df['time'].dt.month, 'day':df['time'].dt.day, 
    'hour':[9]*len(df.index)}))
df.loc[df['time'].dt.hour > 17, 'time'] = pd.to_datetime(pd.DataFrame({
    'year':df['time'].dt.year, 'month':df['time'].dt.month, 'day':df['time'].dt.day, 
    'hour':[17]*len(df.index)}))
df['time'] = [time.time() for time in pd.to_datetime(df['time'])]

Since your 'time' column contains strings they can kept as strings and assign new string values where appropriate. To filter for your criteria it is convenient to: create datetime Series from the 'time' column, create boolean Series by comparing the datetime Series with your criteria, use the boolean Series to filter the rows which need to be changed.

Your data:

import numpy as np
import pandas as pd

data = {'time':['08:45:00', '09:30:00', '18:00:00', '15:00:00']}
df = pd.DataFrame(data)
print(df.to_string())

>>>
       time
0  08:45:00
1  09:30:00
2  18:00:00
3  15:00:00

Convert to datetime, make boolean Series with your criteria

dts = pd.to_datetime(df['time'])
lt_nine = dts.dt.hour < 9
gt_seventeen = (dts.dt.hour >= 17)
print(lt_nine)
print(gt_seventeen)

>>>
0     True
1    False
2    False
3    False
Name: time, dtype: bool
0    False
1    False
2     True
3    False
Name: time, dtype: bool

Use the boolean series to assign a new value:

df.loc[lt_nine,'time'] = '09:00:00'
df.loc[gt_seventeen,'time'] = '17:00:00'
print(df.to_string())

>>>
       time
0  09:00:00
1  09:30:00
2  17:00:00
3  15:00:00

Or just stick with strings altogether and create the boolean Series using regex patterns and .str.match .

data = {'time':['08:45:00', '09:30:00', '18:00:00', '15:00:00','07:22:00','22:02:06']}
dg = pd.DataFrame(data)
print(dg.to_string())

>>>
       time
0  08:45:00
1  09:30:00
2  18:00:00
3  15:00:00
4  07:22:00
5  22:02:06


# regex patterns
pattern_lt_nine = '^00|01|02|03|04|05|06|07|08'
pattern_gt_seventeen = '^17|18|19|20|21|22|23'

Make boolean Series and assign new values

gt_seventeen = dg['time'].str.match(pattern_gt_seventeen)
lt_nine = dg['time'].str.match(pattern_lt_nine)
dg.loc[lt_nine,'time'] = '09:00:00'
dg.loc[gt_seventeen,'time'] = '17:00:00'
print(dg.to_string())

>>>
       time
0  09:00:00
1  09:30:00
2  17:00:00
3  15:00:00
4  09:00:00
5  17:00:00

Time series / date functionality
Working with text data

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM