I would like some help trying to efficiently collocate two datasets, one is let's say observations of rainfall, in terms of datetime, latitude and longitude. The other is meteorological data eg reanalysis given also in terms of datetime, latitude and longitude. Below I provide two example random df and xarrays and then collocate them.
from numpy.random import rand
from random import randint
from datetime import datetime, timedelta
import xarray as xr
import numpy as np
#create example data of the dataframe we want to collocate with the meterological data
datetimes = pd.date_range(start='2002-01-01 10:00:00', end='2002-01-05 10:00:00', freq='H')
rainfall = rand(len(datetimes))
latitudes = [randint(0, 90) for p in range(0, len(datetimes))]
longitudes = [randint(0, 180) for p in range(0, len(datetimes))]
df_obs = pd.DataFrame({'datetime':datetimes, 'rainfall':rainfall, 'latitude':latitudes,
'longitude':longitudes})
#create an xarray which is the example met data
met_type = np.ones((720, 1440))
rainfall = rand(len(datetimes))
met_list = [x*met_type for x in rainfall]
def produce_xarray(met_list, datetimes, met_type='rain', datetime_var="datetime"): [![enter image description here][1]][1]
if isinstance(datetimes[0], datetime) == False:
dates = [datetime.strptime(x, '%Y%m') for x in datetimes]
if isinstance(datetimes[0], datetime) == True:
dates = datetimes
met_list_dstack = np.dstack(met_list)
lats = np.arange(90, -90, -0.25)
lons = np.arange(-180,180, 0.25)
ds = xr.Dataset(data_vars={met_type:(["latitude","longitude",datetime_var], met_list_dstack),},
coords={"latitude": lats, "longitude": lons, datetime_var: dates})
ds[met_type].attrs["units"] = "g "+str(met_type)+"m$^{-2}$"
return ds
xr_met = produce_xarray(met_list, datetimes, datetime_var="datetime")
#now I wish to collocate the data as quickly as possible, as my datasets are huge -
#here I have a function which finds the closest value using the datetime, latitude and longitude
#the I apply this function to the df of my random observations
var ='rain'
def find_value_lat_lon(lat, lon, traj_datetime):
array = xr_met[var].sel(latitude=lat, longitude=lon, datetime=traj_datetime, method='nearest').squeeze()
value = array.values
return value
def append_var_columnwise(df, var_name):
df = df.copy()
df.loc[:, var_name] = df[['latitude', 'longitude', 'datetime']].apply(lambda x: find_value_lat_lon(*x),
axis=1)
return df
print(df_obs)
print(xr_met)
df_obs = append_var_columnwise(df_obs, var_name='rain_met')
print(df_obs)
The final output is shown in the picture - whereby the df has an additional column with 'rain met' - for 97 data points this takes 212ms.
I don't know that it is any faster, but .sel
supports vectorized indexing (see https://docs.xarray.dev/en/stable/user-guide/indexing.html#vectorized-indexing : the last example in this section is a 2D version of your code)
df.loc[:, var_name] = xr_met[var].sel(
latitude=xr.DataArray(df['latitude']),
longitude=xr.DataArray(df['longitude']),
datetime=xr.DataArray(df['datetime']),
method='nearest')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.