简体   繁体   中英

Convert String Column directly to Date format (not Datetime) in Pandas DataFrame

I have a following Pandas DataFrame:

df = pd.DataFrame({'a': ['2020-01-02', '2020-01-02']})

Obviously, the column 'a' is string. I want to convert it to 'Date' type; and here is what I did:

df['a'] = df['a'].apply(pd.to_datetime).dt.date

It works, but in reality my DataFrame has 500,000 + rows. It seems to be very inefficient. Is there any way to directly and more efficiently convert string column to Date column?

pandas.DataFrame.apply is essentially a native python for loop.

pandas.to_datetime is a vectorized function, meaning it's meant to operate on sequences/lists/arrays/series by doing the inner loop in C

If we start with a larger dataframe:

import pandas
df = pandas.DataFrame({'a': ['2020-01-02', '2020-01-02'] * 5000})

And then do (in a jupyter notebook)

%%timeit
df['a'].apply(pandas.to_datetime).dt.date

We get a pretty slow result:

1.03 s ± 48.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

But if we rearrange just slightly to pass the entire column:

%%timeit
pandas.to_datetime(df['a']).dt.date

We get a much faster result:

6.07 ms ± 232 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

use df['a'] = pd.to_datetime(df['a'], format='%Y-%m-%d')

specify the format if you know they are all following the same format.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM