I have a column of a Pandas DataFrame that is fractional day of year (DOY). This column appears as:
DOY
0 200.749967
1 200.791667
2 200.833367
3 200.874967
4 200.916667
5 200.958367
6 200.999967
7 201.041667
...
3491 627.166667
3492 627.333367
3493 627.499967
3494 627.666667
3495 627.833367
3496 627.999967
3497 628.166667
3498 628.333367
Name: DOY, Length: 3499, dtype: float64
The starting year is 2011, however the DOY data continues with increasing values through 2012 without resetting to zero on the new year.
How do I convert this to a Pandas DatetimeIndex with format 'YYYY-MM-DD HH:MM:SS'?
One way I can think to do this is to convert your column to TimeDelta
and then add it to the base offset (2011/1/1).
df.DOY = pd.to_datetime('2011-1-1') + pd.to_timedelta(df.DOY, unit='D')
print(df.DOY)
0 2011-07-20 17:59:57.148800
1 2011-07-20 19:00:00.028800
2 2011-07-20 20:00:02.908800
3 2011-07-20 20:59:57.148800
4 2011-07-20 22:00:00.028800
5 2011-07-20 23:00:02.908800
6 2011-07-20 23:59:57.148800
7 2011-07-21 01:00:00.028800
...
3491 2012-09-19 04:00:00.028800
3492 2012-09-19 08:00:02.908800
3493 2012-09-19 11:59:57.148800
3494 2012-09-19 16:00:00.028800
3495 2012-09-19 20:00:02.908800
3496 2012-09-19 23:59:57.148800
3497 2012-09-20 04:00:00.028800
3498 2012-09-20 08:00:02.908800
Name: DOY, dtype: datetime64[ns]
Another method would be to call pd.to_datetime
with the origin
parameter set, as agtoever shows in their answer .
Although the accepted answer is correct in the conversion of DOY to Datetime, there is a slight mistake that has been overlooked.
Midnight of January 1 for any year is DOY 1.0. As you proceed with fractional DOY time, Jan 1 12:00 is DOY 1.5, Jan 2 00:00 is DOY 2.0, etc...
If you add the DOY time to a base offset date, as suggested in other answers, the resulting time is offset forward by one day. For example, pd.to_datetime('2011-01-01') + pd.to_timedelta(df.DOY, unit='D')
, with a DOY series that starts with 1.0, results in a starting date of '2011-01-02' which is incorrect. This is a result of the convention that DOY time starts with 1 instead of 0. See here for more info.
Therefore, the correct answer (producing correct Datetime results) is:
df.DOY = pd.to_datetime('2011-1-1') + pd.to_timedelta(gps.DOY, unit='D') - pd.Timedelta(days=1)
Just use to_datetime
with the appropiate parameters ( read the manual ):
>>> pandas.to_datetime([0,0.1,200,400,800], unit='D', origin=pandas.Timestamp('01-01-2011'))
DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 02:24:00', '2011-07-20 00:00:00', '2012-02-05 00:00:00', '2013-03-11 00:00:00'], dtype='datetime64[ns]', freq=None)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.