Convert String Column directly to Date format (not Datetime) in Pandas DataFrame

Question

I have a following Pandas DataFrame:

df = pd.DataFrame({'a': ['2020-01-02', '2020-01-02']})

Obviously, the column 'a' is string. I want to convert it to 'Date' type; and here is what I did:

df['a'] = df['a'].apply(pd.to_datetime).dt.date

It works, but in reality my DataFrame has 500,000 + rows. It seems to be very inefficient. Is there any way to directly and more efficiently convert string column to Date column?

Answer 1

pandas.DataFrame.apply is essentially a native python for loop.

pandas.to_datetime is a vectorized function, meaning it's meant to operate on sequences/lists/arrays/series by doing the inner loop in C

If we start with a larger dataframe:

import pandas
df = pandas.DataFrame({'a': ['2020-01-02', '2020-01-02'] * 5000})

And then do (in a jupyter notebook)

%%timeit
df['a'].apply(pandas.to_datetime).dt.date

We get a pretty slow result:

1.03 s ± 48.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

But if we rearrange just slightly to pass the entire column:

%%timeit
pandas.to_datetime(df['a']).dt.date

We get a much faster result:

6.07 ms ± 232 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Answer 2

use df['a'] = pd.to_datetime(df['a'], format='%Y-%m-%d')

specify the format if you know they are all following the same format.

Convert String Column directly to Date format (not Datetime) in Pandas DataFrame

Question

2 answers

solution1
2 2021-03-29 22:14:41

solution2
1 2021-03-29 22:14:43

Convert String Column directly to Date format (not Datetime) in Pandas DataFrame

Question

2 answers

solution1 2 2021-03-29 22:14:41

solution2 1 2021-03-29 22:14:43

solution1
2 2021-03-29 22:14:41

solution2
1 2021-03-29 22:14:43