How do I find the hour with most rides taken?

Question

I have a dataset below regarding the start time commuters book a car. I'd like to

create a function to discretise all bookings into their respective hours,
and find the hour (in AM/PM format) with the most bookings

The pandas dataframe looks like this:

BookingID	RideStart
01	2022-01-01 00:07:52.943
02	2022-01-01 00:09:31.745
03	2022-01-01 00:14:37.187
04	2022-01-02 00:18:09.127

Desired output: printf("{x} am/pm is the the hour with the highest bookings made")

I tried the pd.grouper method but it dosent work, with an error "Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex".

Would really appreciate your help to solve this, thank you!

Answer 1

You can use pd.DatetimeIndex for this. And then apply s.value_counts , followed by s.idxmax :

import pandas as pd

# just adding a couple of different hours
data = {'BookingID': {0: 1, 1: 2, 2: 3, 3: 4},
 'RideStart': {0: '2022-01-01 00:07:52.943',
  1: '2022-01-01 18:09:31.745',
  2: '2022-01-01 18:14:37.187',
  3: '2022-01-02 19:18:09.127'}}

df = pd.DataFrame(data)
print(df)

   BookingID                RideStart
0          1  2022-01-01 00:07:52.943
1          2  2022-01-01 18:09:31.745
2          3  2022-01-01 18:14:37.187
3          4  2022-01-02 19:18:09.127

max_hour = pd.DatetimeIndex(df['RideStart']).hour.value_counts().idxmax()
print(f'{max_hour%12} {"pm" if max_hour>12 else "am"} is the hour with the highest bookings made')

6 pm is the hour with the highest bookings made

Answer 2

You don't need the pd.grouper method, pandas already has tools for resampling values based on Datetime. The problem is, the dataframe doesn't currently have Datetime values, just strings. You can use the pd.to_datetime() method as described in this tutorial , and then downsample your data to the hour.

>>> a = ['2022-01-01 00:07:52.943',
    '2022-01-01 00:09:31.745',
    '2022-01-01 01:12:37.187',
    '2022-01-01 02:45:42.834',
    '2022-01-01 02:56:58.152']

>>> df = pd.DataFrame(data=a)
>>> print(df.head())
                         0
0  2022-01-01 00:07:52.943
1  2022-01-01 00:09:31.745
2  2022-01-01 01:12:37.187
3  2022-01-01 02:45:42.834
4  2022-01-01 02:56:58.152

>>> df.index = pd.to_datetime(df[0])
>>> df.resample('H').count()[0] # [0] is to get rid of extra, all-containing column
0   
2022-01-01 00:00:00 2
2022-01-01 01:00:00 1
2022-01-01 02:00:00 2

Answer 3

If you can make assumptions about the string length for the dates. You can do something like the following where you parse the hr from the date into a new column then just get the mode .

Note that I used a for loop at the end in case several hours are the mode.

import pandas as pd

data = [
    ['01','2022-01-01 00:07:52.943'],
    ['02','2022-01-01 00:09:31.745'],
    ['03','2022-01-01 00:14:37.187'],
    ['04','2022-01-02 00:18:09.127'],
    ['05','2022-01-02 00:18:09.130']
]

df = pd.DataFrame(data, columns=['BookingID','RideStart'])

print(df)
print('---\n---')
# BEGIN SOLUTION

df['RideStartHr'] = df['RideStart'].str[14:16]

print(df)

modeList = df['RideStartHr'].mode().values

print('Mode(s):', modeList)

if len(modeList) > 1:
    print('There are {} most frequent hours. Listing all of them.'.format(len(modeList)))

for mode in modeList:
    hr = int(mode) % 12
    ampm = 'am' if mode <= '12' else 'pm'
    print('{} {} is a most frequent hour.'.format(hr, ampm))

How do I find the hour with most rides taken?

Question

3 answers

solution1
0 2022-08-26 17:12:27

solution2
0 2022-08-26 17:12:40

solution3
0 2022-08-26 17:38:27

How do I find the hour with most rides taken?

Question

3 answers

solution1 0 2022-08-26 17:12:27

solution2 0 2022-08-26 17:12:40

solution3 0 2022-08-26 17:38:27

solution1
0 2022-08-26 17:12:27

solution2
0 2022-08-26 17:12:40

solution3
0 2022-08-26 17:38:27