Convert Pandas Column to DateTime

Question

I have one field in a pandas DataFrame that was imported as string format. It should be a datetime variable. How do I convert it to a datetime column and then filter based on date.

Example:

df = pd.DataFrame({'date': ['05SEP2014:00:00:00.000']})

Answer 1

使用to_datetime函数，指定与您的数据匹配的格式。

raw_data['Mycol'] =  pd.to_datetime(raw_data['Mycol'], format='%d%b%Y:%H:%M:%S.%f')

Answer 2

如果您有多个要转换的列，您可以执行以下操作：

df[["col1", "col2", "col3"]] = df[["col1", "col2", "col3"]].apply(pd.to_datetime)

Answer 3

You can use the DataFrame method .apply() to operate on the values in Mycol:

>>> df = pd.DataFrame(['05SEP2014:00:00:00.000'],columns=['Mycol'])
>>> df
                    Mycol
0  05SEP2014:00:00:00.000
>>> import datetime as dt
>>> df['Mycol'] = df['Mycol'].apply(lambda x: 
                                    dt.datetime.strptime(x,'%d%b%Y:%H:%M:%S.%f'))
>>> df
       Mycol
0 2014-09-05

Answer 4

Use the pandas to_datetime function to parse the column as DateTime. Also, by using infer_datetime_format=True , it will automatically detect the format and convert the mentioned column to DateTime.

import pandas as pd
raw_data['Mycol'] =  pd.to_datetime(raw_data['Mycol'], infer_datetime_format=True)

Answer 5

raw_data['Mycol'] =  pd.to_datetime(raw_data['Mycol'], format='%d%b%Y:%H:%M:%S.%f')

works, however it results in a Python warning of A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

I would guess this is due to some chaining indexing.

Answer 6

省时间：

raw_data['Mycol'] =  pd.to_datetime(raw_data['Mycol'])

Answer 7

Just like we convert object data type to float or int. Use astype()

raw_data['Mycol']=raw_data['Mycol'].astype('datetime64[ns]')

Answer 8

To silence `SettingWithCopyWarning`

If you got this warning, then that means your dataframe was probably created by filtering another dataframe. Make a copy of your dataframe before any assignment and you're good to go.

df = df.copy()
df['date'] = pd.to_datetime(df['date'], format='%d%b%Y:%H:%M:%S.%f')

`errors='coerce'` is useful

If some rows are not in the correct format or not datetime at all, errors= parameter is very useful, so that you can convert the valid rows and handle the rows that contained invalid values later.

df['date'] = pd.to_datetime(df['date'], format='%d%b%Y:%H:%M:%S.%f', errors='coerce')

# for multiple columns
df[['start', 'end']] = df[['start', 'end']].apply(pd.to_datetime, format='%d%b%Y:%H:%M:%S.%f', errors='coerce')

Setting the correct `format=` is much faster than letting pandas find out ¹

Long story short, passing the correct format= from the beginning as in chrisb's post is much faster than letting pandas figure out the format, especially if the format contains time component. The runtime difference for dataframes greater than 10k rows is huge (~25 times faster, so we're talking like a couple minutes vs a few seconds). All valid format options can be found at https://strftime.org/ .

¹ Code used to produce the timeit test plot.

import perfplot
from random import choices
from datetime import datetime

mdYHMSf = range(1,13), range(1,29), range(2000,2024), range(24), *[range(60)]*2, range(1000)
perfplot.show(
    kernels=[lambda x: pd.to_datetime(x), 
             lambda x: pd.to_datetime(x, format='%m/%d/%Y %H:%M:%S.%f'), 
             lambda x: pd.to_datetime(x, infer_datetime_format=True),
             lambda s: s.apply(lambda x: datetime.strptime(x, '%m/%d/%Y %H:%M:%S.%f'))],
    labels=["pd.to_datetime(df['date'])", 
            "pd.to_datetime(df['date'], format='%m/%d/%Y %H:%M:%S.%f')", 
            "pd.to_datetime(df['date'], infer_datetime_format=True)", 
            "df['date'].apply(lambda x: datetime.strptime(x, '%m/%d/%Y %H:%M:%S.%f'))"],
    n_range=[2**k for k in range(20)],
    setup=lambda n: pd.Series([f"{m}/{d}/{Y} {H}:{M}:{S}.{f}" 
                               for m,d,Y,H,M,S,f in zip(*[choices(e, k=n) for e in mdYHMSf])]),
    equality_check=pd.Series.equals,
    xlabel='len(df)'
)

Answer 9

It is important to note that pandas.to_datetime will almost never return a datetime.datetime. From the docs

Blockquote

Returns datetime
If parsing succeeded. Return type depends on input:

list-like: DatetimeIndex
Series: Series of datetime64 dtype
scalar: Timestamp

In case when it is not possible to return designated types (e.g. when any element 
of input is before Timestamp.min or after Timestamp.max) return will have 
datetime.datetime type (or corresponding array/Series).

Blockquote

Convert Pandas Column to DateTime

Question

8 answers

solution1
712 ACCPTED 2014-11-05 17:50:27

solution2
88 2019-03-17 13:52:33

solution3
64 2014-11-05 17:51:24

solution4
34 2019-09-23 10:30:48

solution5
22 2017-03-13 20:46:29

solution6
13 2021-10-29 16:44:11

solution7
0 2022-08-23 08:12:20

solution8
0 2023-01-29 18:39:27

To silence `SettingWithCopyWarning`

`errors='coerce'` is useful

Setting the correct `format=` is much faster than letting pandas find out ¹

solution9
-1 2021-10-04 21:27:11

Convert Pandas Column to DateTime

Question

8 answers

solution1 712 ACCPTED 2014-11-05 17:50:27

solution2 88 2019-03-17 13:52:33

solution3 64 2014-11-05 17:51:24

solution4 34 2019-09-23 10:30:48

solution5 22 2017-03-13 20:46:29

solution6 13 2021-10-29 16:44:11

solution7 0 2022-08-23 08:12:20

solution8 0 2023-01-29 18:39:27

To silence SettingWithCopyWarning

errors='coerce' is useful

Setting the correct format= is much faster than letting pandas find out 1

solution9 -1 2021-10-04 21:27:11

solution1
712 ACCPTED 2014-11-05 17:50:27

solution2
88 2019-03-17 13:52:33

solution3
64 2014-11-05 17:51:24

solution4
34 2019-09-23 10:30:48

solution5
22 2017-03-13 20:46:29

solution6
13 2021-10-29 16:44:11

solution7
0 2022-08-23 08:12:20

solution8
0 2023-01-29 18:39:27

To silence `SettingWithCopyWarning`

`errors='coerce'` is useful

Setting the correct `format=` is much faster than letting pandas find out ¹

solution9
-1 2021-10-04 21:27:11