简体   繁体   中英

Extract date from string datetime column in pandas

I have a column cash_date in pandas dataframe which is a object. I am not able to use pandas to_datetime function here. Shape of my data frame is (47654566,5).My data frame looks like

cash_date                                amount    id
02-JAN-13 12.00.00.000000000 AM           100       1
13-FEB-13 12.00.00.000000000 AM           200       2
09-MAR-13 12.00.00.000000000 AM           300       3
03-APR-13 12.00.00.000000000 AM           400       4
02-JAN-13 06.26.02.438000000 PM           500       7
17-NOV-18 08.31.47.443000000 PM           700       8

I tried following ways -

df.cash_date = pd.to_datetime(df['cash_date'], errors='coerce') # Not working

for i in range(len(df)):
    df.cash_date = df.cash_date.astype(str).str.split('\d\d.\d\d.\d\d.\d\d\d\d\d\d\d\d\d')[i][0] # Not working

I want the data frame looks like s-

cash_date                                amount    id       date
02-JAN-13 12.00.00.000000000 AM           100       1       02-JAN-13
13-FEB-13 12.00.00.000000000 AM           200       2       13-FEB-13
09-MAR-13 12.00.00.000000000 AM           300       3       09-MAR-13
03-APR-13 12.00.00.000000000 AM           400       4       03-APR-13
02-JAN-13 06.26.02.438000000 PM           500       7       02-JAN-13
17-NOV-18 08.31.47.443000000 PM           700       8       17-NOV-18

Specify a format=... argument.

pd.to_datetime(df['cash_date'], format='%d-%b-%y %H.%M.%S.%f %p', errors='coerce')

0   2013-01-02 12:00:00.000
1   2013-02-13 12:00:00.000
2   2013-03-09 12:00:00.000
3   2013-04-03 12:00:00.000
4   2013-01-02 06:26:02.438
5   2018-11-17 08:31:47.443
Name: cash_date, dtype: datetime64[ns]

Details about acceptable formats may be found at http://strftime.org .

From here, you can floor the datetimes using dt.floor :

df['date'] = pd.to_datetime(
    df['cash_date'], format='%d-%b-%y %H.%M.%S.%f %p', errors='coerce'
).dt.floor('D')

df
                         cash_date  amount  id       date
0  02-JAN-13 12.00.00.000000000 AM     100   1 2013-01-02
1  13-FEB-13 12.00.00.000000000 AM     200   2 2013-02-13
2  09-MAR-13 12.00.00.000000000 AM     300   3 2013-03-09
3  03-APR-13 12.00.00.000000000 AM     400   4 2013-04-03
4  02-JAN-13 06.26.02.438000000 PM     500   7 2013-01-02
5  17-NOV-18 08.31.47.443000000 PM     700   8 2018-11-17

OTOH, if you are looking to extract the date component without parsing the date, there are a couple of options:

str.split

df['date'] = df['cash_date'].str.split(n=1).str[0]
df
                         cash_date  amount  id       date
0  02-JAN-13 12.00.00.000000000 AM     100   1  02-JAN-13
1  13-FEB-13 12.00.00.000000000 AM     200   2  13-FEB-13
2  09-MAR-13 12.00.00.000000000 AM     300   3  09-MAR-13
3  03-APR-13 12.00.00.000000000 AM     400   4  03-APR-13
4  02-JAN-13 06.26.02.438000000 PM     500   7  02-JAN-13
5  17-NOV-18 08.31.47.443000000 PM     700   8  17-NOV-18

Or, using a list comprehension .

df['date'] = [x.split(None, 1)[0] for x in df['cash_date']]
df
                         cash_date  amount  id       date
0  02-JAN-13 12.00.00.000000000 AM     100   1  02-JAN-13
1  13-FEB-13 12.00.00.000000000 AM     200   2  13-FEB-13
2  09-MAR-13 12.00.00.000000000 AM     300   3  09-MAR-13
3  03-APR-13 12.00.00.000000000 AM     400   4  03-APR-13
4  02-JAN-13 06.26.02.438000000 PM     500   7  02-JAN-13
5  17-NOV-18 08.31.47.443000000 PM     700   8  17-NOV-18

I will wager this is the faster of the two options .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM