I have a dataset below regarding the start time commuters book a car. I'd like to
The pandas dataframe looks like this:
BookingID | RideStart |
---|---|
01 | 2022-01-01 00:07:52.943 |
02 | 2022-01-01 00:09:31.745 |
03 | 2022-01-01 00:14:37.187 |
04 | 2022-01-02 00:18:09.127 |
Desired output: printf("{x} am/pm is the the hour with the highest bookings made")
I tried the pd.grouper method but it dosent work, with an error "Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex".
Would really appreciate your help to solve this, thank you!
You can use pd.DatetimeIndex
for this. And then apply s.value_counts
, followed by s.idxmax
:
import pandas as pd
# just adding a couple of different hours
data = {'BookingID': {0: 1, 1: 2, 2: 3, 3: 4},
'RideStart': {0: '2022-01-01 00:07:52.943',
1: '2022-01-01 18:09:31.745',
2: '2022-01-01 18:14:37.187',
3: '2022-01-02 19:18:09.127'}}
df = pd.DataFrame(data)
print(df)
BookingID RideStart
0 1 2022-01-01 00:07:52.943
1 2 2022-01-01 18:09:31.745
2 3 2022-01-01 18:14:37.187
3 4 2022-01-02 19:18:09.127
max_hour = pd.DatetimeIndex(df['RideStart']).hour.value_counts().idxmax()
print(f'{max_hour%12} {"pm" if max_hour>12 else "am"} is the hour with the highest bookings made')
6 pm is the hour with the highest bookings made
You don't need the pd.grouper
method, pandas already has tools for resampling values based on Datetime. The problem is, the dataframe doesn't currently have Datetime values, just strings. You can use the pd.to_datetime()
method as described in this tutorial , and then downsample your data to the hour.
>>> a = ['2022-01-01 00:07:52.943',
'2022-01-01 00:09:31.745',
'2022-01-01 01:12:37.187',
'2022-01-01 02:45:42.834',
'2022-01-01 02:56:58.152']
>>> df = pd.DataFrame(data=a)
>>> print(df.head())
0
0 2022-01-01 00:07:52.943
1 2022-01-01 00:09:31.745
2 2022-01-01 01:12:37.187
3 2022-01-01 02:45:42.834
4 2022-01-01 02:56:58.152
>>> df.index = pd.to_datetime(df[0])
>>> df.resample('H').count()[0] # [0] is to get rid of extra, all-containing column
0
2022-01-01 00:00:00 2
2022-01-01 01:00:00 1
2022-01-01 02:00:00 2
If you can make assumptions about the string length for the dates. You can do something like the following where you parse the hr from the date into a new column then just get the mode
.
Note that I used a for loop at the end in case several hours are the mode.
import pandas as pd
data = [
['01','2022-01-01 00:07:52.943'],
['02','2022-01-01 00:09:31.745'],
['03','2022-01-01 00:14:37.187'],
['04','2022-01-02 00:18:09.127'],
['05','2022-01-02 00:18:09.130']
]
df = pd.DataFrame(data, columns=['BookingID','RideStart'])
print(df)
print('---\n---')
# BEGIN SOLUTION
df['RideStartHr'] = df['RideStart'].str[14:16]
print(df)
modeList = df['RideStartHr'].mode().values
print('Mode(s):', modeList)
if len(modeList) > 1:
print('There are {} most frequent hours. Listing all of them.'.format(len(modeList)))
for mode in modeList:
hr = int(mode) % 12
ampm = 'am' if mode <= '12' else 'pm'
print('{} {} is a most frequent hour.'.format(hr, ampm))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.