[英]xarray create Dataset from list of lat/lon points (not square!)
I need to create a Dataset from an irregular list of latitudes/longitudes.我需要根据不规则的纬度/经度列表创建数据集。 These have been stacked into a list of 'pixels' that I need to unstack and convert back to a regular grid of latitude/longitudes.这些已被堆叠到一个“像素”列表中,我需要将其取消堆叠并转换回常规的纬度/经度网格。 Because the data values are not complete for every pixel in the grid I need to fill the missing values as np.nan
.因为网格中每个像素的数据值都不完整,所以我需要将缺失值填充为np.nan
。
I am having trouble creating the xr.Dataset
with an irregular list of lat lon points (pixels).我在使用不规则的经纬度点(像素)列表创建xr.Dataset
时遇到问题。
Note: the data is not complete, it is of shape (99,)
and so I cannot simply reshape the data to fit within the unique latitudes / longitudes.注意:数据不完整,形状为(99,)
,因此我不能简单地重塑数据以适应独特的纬度/经度。
import numpy as np
import xarray as xr
unique_latitudes = np.arange(0, 10)
unique_longitudes = np.arange(0, 10)
_ = np.array((np.meshgrid(latitudes, longitudes))).T.reshape(-1, 2)
# we don't have a complete grid of pixels
pixels = _[:99]
latitudes = pixels[:, 0]
longitudes = pixels[:, 1]
pixel_id = [i for i in range(len(pixels))]
# there is one missing datapoint (only 99 pixels so can't simply reshape data)
data = np.random.choice([0,1,2], (pixels.shape[0]))
coords = {'pixel': pixels}
dims = ['pixel']
xr.Dataset({'data': (dims, data)})
Out[]:
<xarray.Dataset>
Dimensions: (pixel: 99)
Dimensions without coordinates: pixel
Data variables:
data (pixel) int64 1 1 2 0 1 0 1 1 0 1 1 2 1 ... 2 2 0 0 1 2 1 2 0 1 1 2
This is as far as I have got with my data.据我所知,这是我的数据。 I have a length 99 array.我有一个长度为 99 的数组。 But each of these values corresponds to one latitude and one longitude.但是这些值中的每一个都对应一个纬度和一个经度。
pixels[:5]
Out[]:
array([[0, 0],
[0, 1],
[0, 2],
[0, 3],
[0, 4]])
xr.Dataset
with the appropriately labeled lat
/ lon
coordinates我想要的是带有适当标记的lat
/ lon
坐标的xr.Dataset
data = np.random.choice([0,1,2], (100)).astype('float')
data = data.reshape(len(unique_latitudes), len(unique_longitudes))
# remember there is one missing data point in the above data
data[np.unravel_index(99, data.shape)] = np.nan
correct_dims = ['lat', 'lon']
correct_coords = {'lat': unique_latitudes, 'lon': unique_longitudes}
correct_ds = xr.Dataset({'data': (correct_dims, data)}, coords=correct_coords)
correct_ds
Out[]:
<xarray.Dataset>
Dimensions: (lat: 10, lon: 10)
Coordinates:
* lat (lat) int64 0 1 2 3 4 5 6 7 8 9
* lon (lon) int64 0 1 2 3 4 5 6 7 8 9
Data variables:
data (lat, lon) float64 0.0 1.0 0.0 1.0 0.0 2.0 ... 2.0 1.0 1.0 2.0 nan
I have met and solved questions similar to yours, that is why I landed on this page.我遇到并解决了与您类似的问题,这就是我登陆此页面的原因。 My solution is to utilize the connection between pandas dataframes and xarray data arrays. I don't understand the above data you've provided.我的解决方案是利用 pandas 数据帧和 xarray 数据 arrays 之间的连接。我不理解您提供的上述数据。 But I think my logic will probably work for your case.但我认为我的逻辑可能适用于你的情况。
The first step is to prepare your actual data as a pandas dataframe, with columns for lat, lon, and all other variables.第一步是将您的实际数据准备为 pandas dataframe,其中包含纬度、经度和所有其他变量的列。
Then you can generate a 2 columns pandas dataframe based on a full combination of all "unique_lat" and "unique_lon" to represent your "full grid", using something like:然后你可以根据所有“unique_lat”和“unique_lon”的完整组合生成一个 2 列 pandas dataframe 来表示你的“完整网格”,使用类似:
# construct a full grid
def expand_grid(lat,lon):
'''list all combinations of lats and lons using expand_grid(lat,lon)'''
test = [(A,B) for A in lat for B in lon]
test = np.array(test)
test_lat = test[:,0]
test_lon = test[:,1]
full_grid = pd.DataFrame({'lat': test_lat, 'lon': test_lon})
full_grid = full_grid.sort_values(by=['lat','lon'])
full_grid = full_grid.reset_index(drop=True)
return full_grid
Then use pandas to combine your actual data with the full grid you have created.然后使用 pandas 将您的实际数据与您创建的完整网格相结合。 The missing points will be filled with "NAN" directly.缺失的点直接用“NAN”补上。
data_onto_full_grid = pd.merge(full_grid, your_actual_data,how='left")
So now you have got a dataframe that you can actually reshape and convert to xarray, and finally save out.所以现在你有一个 dataframe,你可以实际重塑并转换为 xarray,最后保存。
target_variable_2D = data_onto_full_grid['target_variable'].values.reshape((len(out_lat),len(out_lon)))
target_variable_xr = xr.DataArray(target_variable_2D, coords=[('lat', out_lat),('lon', out_lon)])
target_variable_xr = target_variable_xr.rename("target_variable")
display(target_variable_xr)
You can use for loop to handle a few variables and merge them together into one xarray.您可以使用 for 循环来处理几个变量并将它们合并到一个 xarray 中。
# use for loop to convert all variables to xarray data arrays and combine them
var_2D = []
var_xr = []
for i in range(0,len(data_onto_full_grid.columns)):
print(data_onto_full_grid.columns[i])
# skip the "lat" and "lon" columns as they are going to be the dimensions in netcdf
if (i < 2):
var_2D.append(np.nan)
var_xr.append(np.nan)
else:
var_2D.append(data_onto_full_grid.iloc[:,i].values.reshape((len(out_lat),len(out_lon))))
var_xr.append(xr.DataArray(var_2D[i], coords=[('lat', out_lat),('lon', out_lon)]))
# provide unique names so you can merge them later
for i in range(len(var_xr)):
if (i >= 2):
var_xr[i] = var_xr[i].rename(data_onto_full_grid.columns[i])
# merge xarrays for all variables into a single one
results_xr = xr.merge(var_xr[2:len(data_onto_full_grid.columns)])
# check your results
display(results_xr)
# save out to netcdf
results_xr.to_netcdf('results.nc')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.