I have a following Pandas DataFrame:
df = pd.DataFrame({'a': ['2020-01-02', '2020-01-02']})
Obviously, the column 'a' is string. I want to convert it to 'Date' type; and here is what I did:
df['a'] = df['a'].apply(pd.to_datetime).dt.date
It works, but in reality my DataFrame has 500,000 + rows. It seems to be very inefficient. Is there any way to directly and more efficiently convert string column to Date column?
pandas.DataFrame.apply
is essentially a native python for
loop.
pandas.to_datetime
is a vectorized function, meaning it's meant to operate on sequences/lists/arrays/series by doing the inner loop in C
If we start with a larger dataframe:
import pandas
df = pandas.DataFrame({'a': ['2020-01-02', '2020-01-02'] * 5000})
And then do (in a jupyter notebook)
%%timeit
df['a'].apply(pandas.to_datetime).dt.date
We get a pretty slow result:
1.03 s ± 48.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
But if we rearrange just slightly to pass the entire column:
%%timeit
pandas.to_datetime(df['a']).dt.date
We get a much faster result:
6.07 ms ± 232 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
use df['a'] = pd.to_datetime(df['a'], format='%Y-%m-%d')
specify the format
if you know they are all following the same format.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.