0 18:30
1 24:50
2 33:21
3 28:39
4 27:30
5 21:26
6 16:42
7 16:48
8 26:07
9 18:13
10 27:15
11 24:33
12 29:43
13 NaN
14 NaN
15 NaN
16 24:58
17 26:14
18 27:36
19 33:27
Name: Minutes, dtype: object
I have a column named Minutes
which represents minutes spent performing a task. The column is in MM:SS
format with no milliseconds or hours. There are a few Null
values for those who did not perform the task, which I would like to just replace with 00:00
. I've tried converting the column to datetime
with
df['Minutes'] = df['Minutes'].apply(pd.to_datetime, format = '%M:%S', errors='coerce')
which gives me
1 1900-01-01 00:24:50
2 1900-01-01 00:33:21
3 1900-01-01 00:28:39
4 1900-01-01 00:27:30
5 1900-01-01 00:21:26
This is fine I guess, but my goal is to be able to perform sorts on these columns by the most amount of time spent on a task. After I apply the pd.to_datetime
, the datatype of the column is still an object. And when I try to sort I'm faced with:
KeyError Traceback (most recent call last) in ----> 1 df.sort_values(by=df['Minutes'], ascending=True) ~\anaconda3\lib\site-packages\pandas\core\frame.py in sort_values(self, by, axis, ascending, inplace, kind, na_position, ignore_index, key) 5453 5454 by = by[0] -> 5455 k = self._get_label_or_level_values(by, axis=axis) 5456 5457 # need to rewrap column in Series to apply key function ~\anaconda3\lib\site-packages\pandas\core\generic.py in _get_label_or_level_values(self, key, axis) 1682 values = self.axes[axis].get_level_values(key)._values 1683 else: -> 1684 raise KeyError(key) 1685 1686 # Check for duplicates
Replace the NaN values using
df.fillna('00:00')
Followed by:
df['Minutes'] = pd.to_datetime(df['Minutes'], format='%M:%S', errors='coerce')
Followed by:
df.sort_values('Minutes') #Note Ascending is default
pd.to_datetime
with keyword errors='coerce'
takes care of NaNs, it will leave NaT
(not-a-time) for the unknown durations.
Also note that for sorting, you actually do not need to convert to datetime at all.
import pandas as pd
# >>> pd.__version__
# 1.3.5
import numpy as np
df = pd.DataFrame({'Minutes': ["27:15", "24:33", "29:43", "NaN", np.NaN, None]})
# you can do a df.sort_values('Minutes') here already!
df['Minutes'] = pd.to_datetime(df['Minutes'], format='%M:%S', errors='coerce')
df = df.sort_values('Minutes')
# df['Minutes']
# 1 1900-01-01 00:24:33
# 0 1900-01-01 00:27:15
# 2 1900-01-01 00:29:43
# 3 NaT
# 4 NaT
# 5 NaT
# Name: Minutes, dtype: datetime64[ns]
To change the format , you'll need to convert back to string:
df['Minutes'].dt.strftime('%H:%M:%S')
# 1 00:24:33
# 0 00:27:15
# 2 00:29:43
# 3 NaN
# 4 NaN
# 5 NaN
# Name: Minutes, dtype: object
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.