简体   繁体   中英

Converting Pandas Dataframe column object in MM:SS format to Datetime type?

0                18:30
1                24:50
2                33:21
3                28:39
4                27:30
5                21:26
6                16:42
7                16:48
8                26:07
9                18:13
10               27:15
11               24:33
12               29:43
13               NaN
14               NaN
15               NaN
16               24:58
17               26:14
18               27:36
19               33:27
Name: Minutes, dtype: object

I have a column named Minutes which represents minutes spent performing a task. The column is in MM:SS format with no milliseconds or hours. There are a few Null values for those who did not perform the task, which I would like to just replace with 00:00 . I've tried converting the column to datetime with

df['Minutes'] = df['Minutes'].apply(pd.to_datetime, format = '%M:%S', errors='coerce')

which gives me

1       1900-01-01 00:24:50
2       1900-01-01 00:33:21
3       1900-01-01 00:28:39
4       1900-01-01 00:27:30
5       1900-01-01 00:21:26

This is fine I guess, but my goal is to be able to perform sorts on these columns by the most amount of time spent on a task. After I apply the pd.to_datetime , the datatype of the column is still an object. And when I try to sort I'm faced with:

KeyError Traceback (most recent call last) in ----> 1 df.sort_values(by=df['Minutes'], ascending=True) ~\anaconda3\lib\site-packages\pandas\core\frame.py in sort_values(self, by, axis, ascending, inplace, kind, na_position, ignore_index, key) 5453 5454 by = by[0] -> 5455 k = self._get_label_or_level_values(by, axis=axis) 5456 5457 # need to rewrap column in Series to apply key function ~\anaconda3\lib\site-packages\pandas\core\generic.py in _get_label_or_level_values(self, key, axis) 1682 values = self.axes[axis].get_level_values(key)._values 1683 else: -> 1684 raise KeyError(key) 1685 1686 # Check for duplicates

Replace the NaN values using

df.fillna('00:00')

Followed by:

df['Minutes'] = pd.to_datetime(df['Minutes'], format='%M:%S', errors='coerce')  

Followed by:

df.sort_values('Minutes')  #Note Ascending is default

pd.to_datetime with keyword errors='coerce' takes care of NaNs, it will leave NaT (not-a-time) for the unknown durations.

Also note that for sorting, you actually do not need to convert to datetime at all.

import pandas as  pd
# >>> pd.__version__
# 1.3.5
import numpy as np

df = pd.DataFrame({'Minutes': ["27:15", "24:33", "29:43", "NaN", np.NaN, None]})

# you can do a df.sort_values('Minutes') here already!

df['Minutes'] = pd.to_datetime(df['Minutes'], format='%M:%S', errors='coerce')
df = df.sort_values('Minutes')

# df['Minutes']
# 1   1900-01-01 00:24:33
# 0   1900-01-01 00:27:15
# 2   1900-01-01 00:29:43
# 3                   NaT
# 4                   NaT
# 5                   NaT
# Name: Minutes, dtype: datetime64[ns]

To change the format , you'll need to convert back to string:

df['Minutes'].dt.strftime('%H:%M:%S')
# 1    00:24:33
# 0    00:27:15
# 2    00:29:43
# 3         NaN
# 4         NaN
# 5         NaN
# Name: Minutes, dtype: object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM