简体   繁体   中英

How do I convert multiple time formats in the same column to then convert it as a standard time format in Python

I have a column where the timestamp data is in two different formats. MM:SS and HH:MM:SS I'm trying to convert it to a standard format so I can do calculations on it. however pd.to_datetime() does not recognize this since they are not in a consistent format.

Example:

'59:14', '59:16', '59:20', '59:21', '59:24', '59:24', '59:27',
'59:26', '59:29', '59:31', '59:37', '59:39', '59:40', '59:41',
'59:43', '59:44', '59:46', '59:49', '59:51', '59:52', '59:53',
'59:54', '59:55', '59:57', '1:00:02', '1:00:05', '1:00:09',
'1:00:10', '1:00:14', '1:00:17', '1:00:20', '1:00:21', '1:00:22',
'1:00:24', '1:00:29', '1:00:31', '1:00:35', '1:00:37', '1:00:36',
'1:00:41', '1:00:44', '1:00:45', '1:00:50', '1:00:52', '1:00:57',
'1:01:10', '1:01:12', '1:01:14', '1:01:16', '1:01:19', '1:01:21'

I assume that your column to be converted is Time .

To convert it to timedelta , apply a "specialized" conversion function, something like:

df.Time.apply(lambda txt: pd.to_timedelta('0:' + txt if len(txt) < 6 else txt))

Instead of to_timedelta you may also apply to_datetime , but the downside is that the date part is taken from the current day.

"Vanilla" application of pd.to_timedelta (as advised in one of comments) will fail with exception ValueError: expected hh:mm:ss format .

You could preprocess your times before giving it to pd.to_datetime.

For example:

times = ['59:14', '59:16', '59:20', '59:21', '59:24', '59:24', '59:27',
'59:26', '59:29', '59:31', '59:37', '59:39', '59:40', '59:41',
'59:43', '59:44', '59:46', '59:49', '59:51', '59:52', '59:53',
'59:54', '59:55', '59:57', '1:00:02', '1:00:05', '1:00:09',
'1:00:10', '1:00:14', '1:00:17', '1:00:20', '1:00:21', '1:00:22',
'1:00:24', '1:00:29', '1:00:31', '1:00:35', '1:00:37', '1:00:36',
'1:00:41', '1:00:44', '1:00:45', '1:00:50', '1:00:52', '1:00:57',
'1:01:10', '1:01:12', '1:01:14', '1:01:16', '1:01:19', '1:01:21']

clean_times = list(map(lambda elt: '00:' + elt if elt.count(':') == 1 else elt, times))
pandas_times = pd.to_datetime(clean_times)

Will give you the following output:

DatetimeIndex(['2019-11-25 00:59:14', '2019-11-25 00:59:16',
               '2019-11-25 00:59:20', '2019-11-25 00:59:21',
               '2019-11-25 00:59:24', '2019-11-25 00:59:24',
               '2019-11-25 00:59:27', '2019-11-25 00:59:26',
               '2019-11-25 00:59:29', '2019-11-25 00:59:31',
               '2019-11-25 00:59:37', '2019-11-25 00:59:39',
               '2019-11-25 00:59:40', '2019-11-25 00:59:41',
               '2019-11-25 00:59:43', '2019-11-25 00:59:44',
               '2019-11-25 00:59:46', '2019-11-25 00:59:49',
               '2019-11-25 00:59:51', '2019-11-25 00:59:52',
               '2019-11-25 00:59:53', '2019-11-25 00:59:54',
               '2019-11-25 00:59:55', '2019-11-25 00:59:57',
               '2019-11-25 01:00:02', '2019-11-25 01:00:05',
               '2019-11-25 01:00:09', '2019-11-25 01:00:10',
               '2019-11-25 01:00:14', '2019-11-25 01:00:17',
               '2019-11-25 01:00:20', '2019-11-25 01:00:21',
               '2019-11-25 01:00:22', '2019-11-25 01:00:24',
               '2019-11-25 01:00:29', '2019-11-25 01:00:31',
               '2019-11-25 01:00:35', '2019-11-25 01:00:37',
               '2019-11-25 01:00:36', '2019-11-25 01:00:41',
               '2019-11-25 01:00:44', '2019-11-25 01:00:45',
               '2019-11-25 01:00:50', '2019-11-25 01:00:52',
               '2019-11-25 01:00:57', '2019-11-25 01:01:10',
               '2019-11-25 01:01:12', '2019-11-25 01:01:14',
               '2019-11-25 01:01:16', '2019-11-25 01:01:19',
               '2019-11-25 01:01:21'],
              dtype='datetime64[ns]', freq=None)

You can use a function to make the column look as you expected then apply pd.to_datetime()

See Possible Solution Below

# Append 00: to time column if count of ':' is 1
#Define a function that performs row by row operations
def clean_time_col(the_time_col):
    if str(the_time_col).count(':')==1:
        the_time_col='00:'+the_time_col 
    return time_col

#Use Apply to perform row operations on time_col
df['new_time_col'] = df['time_col'].apply(def_clean_time_col)

Code Untested

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM