简体   繁体   中英

How do I find the hour with most rides taken?

I have a dataset below regarding the start time commuters book a car. I'd like to

  1. create a function to discretise all bookings into their respective hours,
  2. and find the hour (in AM/PM format) with the most bookings

The pandas dataframe looks like this:

BookingID RideStart
01 2022-01-01 00:07:52.943
02 2022-01-01 00:09:31.745
03 2022-01-01 00:14:37.187
04 2022-01-02 00:18:09.127

Desired output: printf("{x} am/pm is the the hour with the highest bookings made")

I tried the pd.grouper method but it dosent work, with an error "Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex".

Would really appreciate your help to solve this, thank you!

You can use pd.DatetimeIndex for this. And then apply s.value_counts , followed by s.idxmax :

import pandas as pd

# just adding a couple of different hours
data = {'BookingID': {0: 1, 1: 2, 2: 3, 3: 4},
 'RideStart': {0: '2022-01-01 00:07:52.943',
  1: '2022-01-01 18:09:31.745',
  2: '2022-01-01 18:14:37.187',
  3: '2022-01-02 19:18:09.127'}}

df = pd.DataFrame(data)
print(df)

   BookingID                RideStart
0          1  2022-01-01 00:07:52.943
1          2  2022-01-01 18:09:31.745
2          3  2022-01-01 18:14:37.187
3          4  2022-01-02 19:18:09.127

max_hour = pd.DatetimeIndex(df['RideStart']).hour.value_counts().idxmax()
print(f'{max_hour%12} {"pm" if max_hour>12 else "am"} is the hour with the highest bookings made')

6 pm is the hour with the highest bookings made

You don't need the pd.grouper method, pandas already has tools for resampling values based on Datetime. The problem is, the dataframe doesn't currently have Datetime values, just strings. You can use the pd.to_datetime() method as described in this tutorial , and then downsample your data to the hour.

>>> a = ['2022-01-01 00:07:52.943',
    '2022-01-01 00:09:31.745',
    '2022-01-01 01:12:37.187',
    '2022-01-01 02:45:42.834',
    '2022-01-01 02:56:58.152']

>>> df = pd.DataFrame(data=a)
>>> print(df.head())
                         0
0  2022-01-01 00:07:52.943
1  2022-01-01 00:09:31.745
2  2022-01-01 01:12:37.187
3  2022-01-01 02:45:42.834
4  2022-01-01 02:56:58.152

>>> df.index = pd.to_datetime(df[0])
>>> df.resample('H').count()[0] # [0] is to get rid of extra, all-containing column
0   
2022-01-01 00:00:00 2
2022-01-01 01:00:00 1
2022-01-01 02:00:00 2

If you can make assumptions about the string length for the dates. You can do something like the following where you parse the hr from the date into a new column then just get the mode .

Note that I used a for loop at the end in case several hours are the mode.

import pandas as pd

data = [
    ['01','2022-01-01 00:07:52.943'],
    ['02','2022-01-01 00:09:31.745'],
    ['03','2022-01-01 00:14:37.187'],
    ['04','2022-01-02 00:18:09.127'],
    ['05','2022-01-02 00:18:09.130']
]

df = pd.DataFrame(data, columns=['BookingID','RideStart'])

print(df)
print('---\n---')
# BEGIN SOLUTION

df['RideStartHr'] = df['RideStart'].str[14:16]

print(df)

modeList = df['RideStartHr'].mode().values

print('Mode(s):', modeList)

if len(modeList) > 1:
    print('There are {} most frequent hours. Listing all of them.'.format(len(modeList)))

for mode in modeList:
    hr = int(mode) % 12
    ampm = 'am' if mode <= '12' else 'pm'
    print('{} {} is a most frequent hour.'.format(hr, ampm))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM