简体   繁体   中英

Converting datetime from CSV import formatting error

I have imported a CSV file using read_csv. The raw data in the column of the CSV that I am interested in records the date in the format as follows:

19/01/2012  9:00:00 AM

However, when the data is imported it is shown as:

2005-03-21 10:30:00

Not sure why this is happening. Ultimately I am interested in extracting the date (19/01/2012) and using this to calculate the number of days differance from the earliest date in the column. Something along the lines of....

df['date_column'] = (df['date_column'] - df['date_column'].min())

I have tried a couple of things, firstly:

df['date_column'] = pd.to_datetime(df['date_column'], dayfirst=True)

This returns the same date format as shown above. Namely 2005-03-21 10:30:00

Second attempt was to try

df['date_column'] = pd.to_datetime(df['date_column'], format ='%d-%m-%y %I:%M:%S %p')

This gave me the error

ValueError: time data '2004-03-16 11:40:00' does not match format '%d-%m-%y %I:%M:%S %p' (match)

I have tried a couple of minor variations to the above. I am using Jupyter v 5.7.4 running Python 3.7.1

Certainly would appreciate any advice / help! Thanks.

I think in read_csv is date_column column already converted to datetimes, so convert to dates is not necessary.

If subtract min values get timedeltas, with different format:

rng = pd.date_range('2017-04-03 15:12:10', periods=10, freq='23Min')
df = pd.DataFrame({'date_column': rng})  

df['diff'] = (df['date_column'] - df['date_column'].min())

And if neded datetime in your original format use Series.dt.strftime :

df['date_1'] = df['date_column'].dt.strftime('%d/%m/%Y %H:%M:%S %p')
print (df)
          date_column     diff                  date_1
0 2017-04-03 15:12:10 00:00:00  03/04/2017 15:12:10 PM
1 2017-04-03 15:35:10 00:23:00  03/04/2017 15:35:10 PM
2 2017-04-03 15:58:10 00:46:00  03/04/2017 15:58:10 PM
3 2017-04-03 16:21:10 01:09:00  03/04/2017 16:21:10 PM
4 2017-04-03 16:44:10 01:32:00  03/04/2017 16:44:10 PM
5 2017-04-03 17:07:10 01:55:00  03/04/2017 17:07:10 PM
6 2017-04-03 17:30:10 02:18:00  03/04/2017 17:30:10 PM
7 2017-04-03 17:53:10 02:41:00  03/04/2017 17:53:10 PM
8 2017-04-03 18:16:10 03:04:00  03/04/2017 18:16:10 PM
9 2017-04-03 18:39:10 03:27:00  03/04/2017 18:39:10 PM

I think the day-difference can only be calculated if you convert the date to a day:

import datetime as dt

df['date_column'] = (df['date_column'] - df['date_column'].min()).dt.days

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM