简体   繁体   中英

python - convert multiple string datetimes format into a specific date format

I have this column date with 6 different date strings sizes:

df = pd.DataFrame({'date': {0: '2020-03-21T10:13:08',  1: '2020-03-21T17:43:03',  2: '2020-03-21T13:13:30',  3: '2020-03-21T20:43:02',  4: '3/8/20 5:31',  5: '3/8/20 5:19',  6: '3/22/20 23:45',  7: '3/22/20 23:45',  8: '2/1/2020 11:53',  9: '2/1/2020 10:53',  10: '1/31/2020 15:20',  11: '1/31/2020 10:37',  12: '2020-04-04 23:34:21',  13: '2020-04-04 23:34:21'}}, 
             index=range(0,14))

I need to convert all those different datetimes strings to date format. The approach I'm using is:

  1. Find the first white space and extract the date

  2. Change its format given a certain string length (each string length has its specific date format as you can see below in the format argument)

  3. Do (2) in the respective rows in the dataframe df .

You can see this approach here:

df.loc[df["date"].str.find(" ") == 10, "date"] = pd.to_datetime(df.loc[df["date"].str.find(" ") == 10, "date"].str[0:10])
df.loc[df["date"].str.find(" ") == -1, "date"] = pd.to_datetime(df.loc[df["date"].str.find(" ") == 10, "date"].str[0:10])
df.loc[df["date"].str.find(" ") == 6, "date"] = pd.to_datetime(df.loc[df["date"].str.find(" ") == 6, "date"].str[0:6], format="%m/%d/%y")
df.loc[df["date"].str.find(" ") == 7, "date"] = pd.to_datetime(df.loc[df["date"].str.find(" ") == 7, "date"].str[0:7], format="%m/%d/%y")
df.loc[df["date"].str.find(" ") == 8, "date"] = pd.to_datetime(df.loc[df["date"].str.find(" ") == 8, "date"].str[0:8], format="%m/%d/%Y")
df.loc[df["date"].str.find(" ") == 9, "date"] = pd.to_datetime(df.loc[df["date"].str.find(" ") == 9, "date"].str[0:9], format="%m/%d/%Y")

I'm going perfect until step 3) where I'm trying to find a workaround to make all the format changes in the dataframe, but I can't understand why it doesn't give what it should give. Any suggestions?

By the way, it has to be scalable (I have a lot of rows per format string)

For me working converting all values to datetimes and then remove times with Series.dt.floor if output is datetimes or with Series.dt.date if output are python dates:

df['date'] = pd.to_datetime(df['date']).dt.floor('d')
#dates
#df['date'] = pd.to_datetime(df['date']).dt.date
print (df)
         date
0  2020-03-21
1  2020-03-21
2  2020-03-21
3  2020-03-21
4  2020-03-08
5  2020-03-08
6  2020-03-22
7  2020-03-22
8  2020-02-01
9  2020-02-01
10 2020-01-31
11 2020-01-31
12 2020-04-04
13 2020-04-04

Your solution should be simplify - get first 10 letters, then split by possible space and get first values:

df['date'] = pd.to_datetime(df['date'].str[:10].str.split().str[0])
import pandas as pd

df = pd.DataFrame({'date': {0: '2020-03-21T10:13:08',  1: '2020-03-21T17:43:03',  2: '2020-03-21T13:13:30',  3: '2020-03-21T20:43:02',  4: '3/8/20 5:31',  5: '3/8/20 5:19',  6: '3/22/20 23:45',  7: '3/22/20 23:45',  8: '2/1/2020 11:53',  9: '2/1/2020 10:53',  10: '1/31/2020 15:20',  11: '1/31/2020 10:37',  12: '2020-04-04 23:34:21',  13: '2020-04-04 23:34:21'}}, 
             index=range(0,14))
df
    date
0   2020-03-21T10:13:08
1   2020-03-21T17:43:03
2   2020-03-21T13:13:30
3   2020-03-21T20:43:02
4   3/8/20 5:31
5   3/8/20 5:19
6   3/22/20 23:45
7   3/22/20 23:45
8   2/1/2020 11:53
9   2/1/2020 10:53
10  1/31/2020 15:20
11  1/31/2020 10:37
12  2020-04-04 23:34:21
13  2020-04-04 23:34:21

df['date'] = pd.to_datetime(df['date'])
df
    date
0   2020-03-21 10:13:08
1   2020-03-21 17:43:03
2   2020-03-21 13:13:30
3   2020-03-21 20:43:02
4   2020-03-08 05:31:00
5   2020-03-08 05:19:00
6   2020-03-22 23:45:00
7   2020-03-22 23:45:00
8   2020-02-01 11:53:00
9   2020-02-01 10:53:00
10  2020-01-31 15:20:00
11  2020-01-31 10:37:00
12  2020-04-04 23:34:21
13  2020-04-04 23:34:21

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM