All, I am trying to read the time coordinate from Berkley Earth in the following temperature file. The time spans from 1850 to 2022. The time unit is in the year AD (1850.041667, 1850.125, 1850.208333, ..., 2022.708333, 2022.791667,2022.875).
The pandas.to_datetime
cannot correctly interpret the time array because I think I need to state the origin of the time coordinate and the unit. I tried to use pd.to_datetime(dti,unit='D',origin='julian')
, but it did not work (out of bounds). Also, I think I have to use a unit of years instead of Days.
The file is located here http://berkeleyearth.lbl.gov/auto/Global/Gridded/Land_and_Ocean_LatLong1.nc
import xarray as xr
import numpy as np
import pandas as pd
# read data into memory
flname="Land_and_Ocean_LatLon1.nc"
ds = xr.open_dataset("./"+flname)
dti = ds['time']
pd.to_datetime(dti,unit='D',origin='julian')
np.diff(dti)
Convert to datetime using %Y
as parsing directive to get the year only, then add the fractional year as a timedelta of days. Note that you have might have to account for leap years when calculating the timedelta. Ex:
import pandas as pd
dti = pd.to_datetime(ds['time'], format="%Y")
# it might be sufficient to use e.g. 365 or 365.25 here, depending on the input
daysinyear = pd.Series([366]*dti.size).where(dti.is_leap_year, 365)
dti = dti + pd.to_timedelta(daysinyear * (ds['time']-ds['time'].astype(int)), unit="d")
dti
0 1850-01-16 04:59:59.999971200
1 1850-02-15 15:00:00.000000000
2 1850-03-18 01:00:00.000028800
3 1850-04-17 10:59:59.999971200
4 1850-05-17 21:00:00.000000000
2070 2022-07-17 16:59:59.999971200
2071 2022-08-17 03:00:00.000000000
2072 2022-09-16 13:00:00.000028800
2073 2022-10-16 22:59:59.999971200
2074 2022-11-16 09:00:00.000000000
Length: 2075, dtype: datetime64[ns]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.