简体   繁体   中英

How to add a column to pandas dataframe based on time from another column

I am trying to add a column into a pandas dataframe , that inserts Morning , Evening or Afternoon , based on the time slots that I choose.

The code I am trying is as follows:

df_agg['timeOfDay'] = df_agg.apply(lambda _: '', axis=1)
for i in range (len(df_agg)):
        if df_agg['time_stamp'].iloc[i][0].hour < 12:
            df_agg['timeOfDay'].iloc[i] = 'Morning'
        elif df_agg['time_stamp'].iloc[i][0].hour < 17 & df_agg['time_stamp'].iloc[i][0].hour > 12:
            df_agg['timeOfDay'].iloc[i] = 'Afternoon'
        else:
             df_agg['timeOfDay'].iloc[i] = 'Evening'

When I go to return my df_agg , it returns an empty timeOfDay column. Does anyone know what I am doing wrong, when trying to insert these elements into the row, based on the time of day?

pandas
use pd.cut to break it by bins and give labels. This method makes it trivial to create more granular time slots as well

df_agg.assign(
    timeOfDay=pd.cut(
        df_agg.time_stamp.dt.hour,
        [-1, 12, 17, 24],
        labels=['Morning', 'Afternoon', 'Evening']))

numpy
using searchsorted

hours = df_agg.time_stamp.dt.hour.values
times = np.array(['Morning', 'Afternoon', 'Evening'])

df_agg.assign(timeOfDay=times[np.array([12, 17]).searchsorted(hours)])

both yield

在此处输入图片说明


time test
small data set

在此处输入图片说明

large data set

start = pd.to_datetime('2015-02-24 10:00:00')
rng = pd.date_range(start, periods=10000, freq='1h')

df_agg = pd.DataFrame({'time_stamp': rng, 'a': range(len(rng))})  

在此处输入图片说明


setup
borrowed @jezrael's setup df_agg

start = pd.to_datetime('2015-02-24 10:00:00')
rng = pd.date_range(start, periods=12, freq='1h')

df_agg = pd.DataFrame({'time_stamp': rng, 'a': range(len(rng))})  
print (df_agg)

I think you can use double numpy.where , please check if is not necessary change < to <= or > to >= :

start = pd.to_datetime('2015-02-24 10:00:00')
rng = pd.date_range(start, periods=12, freq='1h')

df_agg = pd.DataFrame({'time_stamp': rng, 'a': range(12)})  
print (df_agg)
     a          time_stamp
0    0 2015-02-24 10:00:00
1    1 2015-02-24 11:00:00
2    2 2015-02-24 12:00:00
3    3 2015-02-24 13:00:00
4    4 2015-02-24 14:00:00
5    5 2015-02-24 15:00:00
6    6 2015-02-24 16:00:00
7    7 2015-02-24 17:00:00
8    8 2015-02-24 18:00:00
9    9 2015-02-24 19:00:00
10  10 2015-02-24 20:00:00
11  11 2015-02-24 21:00:00
hours = df_agg.time_stamp.dt.hour.values
df_agg['timeOfDay'] = np.where(hours <= 12, 'Morning', 
                      np.where(hours >= 17, 'Evening', 'Afternoon'))

     a          time_stamp  timeOfDay
0    0 2015-02-24 10:00:00    Morning
1    1 2015-02-24 11:00:00    Morning
2    2 2015-02-24 12:00:00    Morning
3    3 2015-02-24 13:00:00  Afternoon
4    4 2015-02-24 14:00:00  Afternoon
5    5 2015-02-24 15:00:00  Afternoon
6    6 2015-02-24 16:00:00  Afternoon
7    7 2015-02-24 17:00:00    Evening
8    8 2015-02-24 18:00:00    Evening
9    9 2015-02-24 19:00:00    Evening
10  10 2015-02-24 20:00:00    Evening
11  11 2015-02-24 21:00:00    Evening

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM