creating new column with the sum of the past 24 hours

Question

For the following dataframe: df_data, is there a way to make a new column that counts the nr of vehicles of the past 24 hours or just of the previous day?

df_data = {'day_of_year' : [1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2], 'nr_of_vehicles' : [254,154,896,268,254,501,840,868,654,684,684,681,632,468,987,134,336,119,874,658,121,254,154,896,268,254,501,840,868,654,684,684,681,632,468,987,134,336,119,874,658,121,268,254,501,840,868,654], 'hour' : [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23]}

Visual representation (nr_of_vehicles is counted per hour):

I thought of grouping the data by day_of_year by using the following

df_data_day = df_data.groupby('day_of_year').agg({'nr_of_vehicles': 'sum'})

but I don't know how I could assign it correctly to the column, because the are more rows in the original dataframe.

Answer 1

You were not far: you had just to use transform instead of agg :

df_data_day = df_data.groupby('day_of_year')['nr_of_vehicles'].transform('mean')

You can even directly add a new column:

df_data['nr_by_day'] = df_data.groupby('day_of_year')['nr_of_vehicles'].transform('mean')

BTW: I used your proposed code which computes the average, when your title says sum...

creating new column with the sum of the past 24 hours

Question

1 answers

solution1
1 ACCPTED 2021-03-04 15:56:45

creating new column with the sum of the past 24 hours

Question

1 answers

solution1 1 ACCPTED 2021-03-04 15:56:45

solution1
1 ACCPTED 2021-03-04 15:56:45