I have imported a CSV file using read_csv. The raw data in the column of the CSV that I am interested in records the date in the format as follows:
19/01/2012 9:00:00 AM
However, when the data is imported it is shown as:
2005-03-21 10:30:00
Not sure why this is happening. Ultimately I am interested in extracting the date (19/01/2012) and using this to calculate the number of days differance from the earliest date in the column. Something along the lines of....
df['date_column'] = (df['date_column'] - df['date_column'].min())
I have tried a couple of things, firstly:
df['date_column'] = pd.to_datetime(df['date_column'], dayfirst=True)
This returns the same date format as shown above. Namely 2005-03-21 10:30:00
Second attempt was to try
df['date_column'] = pd.to_datetime(df['date_column'], format ='%d-%m-%y %I:%M:%S %p')
This gave me the error
ValueError: time data '2004-03-16 11:40:00' does not match format '%d-%m-%y %I:%M:%S %p' (match)
I have tried a couple of minor variations to the above. I am using Jupyter v 5.7.4 running Python 3.7.1
Certainly would appreciate any advice / help! Thanks.
I think in read_csv
is date_column
column already converted to datetimes, so convert to dates is not necessary.
If subtract min values get timedeltas, with different format:
rng = pd.date_range('2017-04-03 15:12:10', periods=10, freq='23Min')
df = pd.DataFrame({'date_column': rng})
df['diff'] = (df['date_column'] - df['date_column'].min())
And if neded datetime in your original format use Series.dt.strftime
:
df['date_1'] = df['date_column'].dt.strftime('%d/%m/%Y %H:%M:%S %p')
print (df)
date_column diff date_1
0 2017-04-03 15:12:10 00:00:00 03/04/2017 15:12:10 PM
1 2017-04-03 15:35:10 00:23:00 03/04/2017 15:35:10 PM
2 2017-04-03 15:58:10 00:46:00 03/04/2017 15:58:10 PM
3 2017-04-03 16:21:10 01:09:00 03/04/2017 16:21:10 PM
4 2017-04-03 16:44:10 01:32:00 03/04/2017 16:44:10 PM
5 2017-04-03 17:07:10 01:55:00 03/04/2017 17:07:10 PM
6 2017-04-03 17:30:10 02:18:00 03/04/2017 17:30:10 PM
7 2017-04-03 17:53:10 02:41:00 03/04/2017 17:53:10 PM
8 2017-04-03 18:16:10 03:04:00 03/04/2017 18:16:10 PM
9 2017-04-03 18:39:10 03:27:00 03/04/2017 18:39:10 PM
I think the day-difference can only be calculated if you convert the date to a day:
import datetime as dt
df['date_column'] = (df['date_column'] - df['date_column'].min()).dt.days
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.