简体   繁体   English

xarray 从纬度/经度点列表创建数据集(不是正方形!)

[英]xarray create Dataset from list of lat/lon points (not square!)

I need to create a Dataset from an irregular list of latitudes/longitudes.我需要根据不规则的纬度/经度列表创建数据集。 These have been stacked into a list of 'pixels' that I need to unstack and convert back to a regular grid of latitude/longitudes.这些已被堆叠到一个“像素”列表中,我需要将其取消堆叠并转换回常规的纬度/经度网格。 Because the data values are not complete for every pixel in the grid I need to fill the missing values as np.nan .因为网格中每个像素的数据值都不完整,所以我需要将缺失值填充为np.nan

I am having trouble creating the xr.Dataset with an irregular list of lat lon points (pixels).我在使用不规则的经纬度点(像素)列表创建xr.Dataset时遇到问题。

Reproducible Example:可重现的例子:

Create an example of how my data looks创建我的数据外观示例

Note: the data is not complete, it is of shape (99,) and so I cannot simply reshape the data to fit within the unique latitudes / longitudes.注意:数据不完整,形状为(99,) ,因此我不能简单地重塑数据以适应独特的纬度/经度。

import numpy as np
import xarray as xr

unique_latitudes = np.arange(0, 10)
unique_longitudes = np.arange(0, 10)
_ = np.array((np.meshgrid(latitudes, longitudes))).T.reshape(-1, 2)

# we don't have a complete grid of pixels
pixels = _[:99]
latitudes = pixels[:, 0]
longitudes = pixels[:, 1]
pixel_id = [i for i in range(len(pixels))]

# there is one missing datapoint (only 99 pixels so can't simply reshape data)
data = np.random.choice([0,1,2], (pixels.shape[0]))
coords = {'pixel': pixels}
dims = ['pixel']

xr.Dataset({'data': (dims, data)})

Out[]:
<xarray.Dataset>
Dimensions:  (pixel: 99)
Dimensions without coordinates: pixel
Data variables:
    data     (pixel) int64 1 1 2 0 1 0 1 1 0 1 1 2 1 ... 2 2 0 0 1 2 1 2 0 1 1 2

This is as far as I have got with my data.据我所知,这是我的数据。 I have a length 99 array.我有一个长度为 99 的数组。 But each of these values corresponds to one latitude and one longitude.但是这些值中的每一个都对应一个纬度和一个经度。

pixels[:5]

Out[]:
array([[0, 0],
       [0, 1],
       [0, 2],
       [0, 3],
       [0, 4]])

What I want is a xr.Dataset with the appropriately labeled lat / lon coordinates我想要的是带有适当标记的lat / lon坐标的xr.Dataset

data = np.random.choice([0,1,2], (100)).astype('float')
data = data.reshape(len(unique_latitudes), len(unique_longitudes))
# remember there is one missing data point in the above data
data[np.unravel_index(99, data.shape)] = np.nan

correct_dims = ['lat', 'lon']
correct_coords = {'lat': unique_latitudes, 'lon': unique_longitudes}
correct_ds = xr.Dataset({'data': (correct_dims, data)}, coords=correct_coords)

correct_ds

Out[]:
<xarray.Dataset>
Dimensions:  (lat: 10, lon: 10)
Coordinates:
  * lat      (lat) int64 0 1 2 3 4 5 6 7 8 9
  * lon      (lon) int64 0 1 2 3 4 5 6 7 8 9
Data variables:
    data     (lat, lon) float64 0.0 1.0 0.0 1.0 0.0 2.0 ... 2.0 1.0 1.0 2.0 nan

I have met and solved questions similar to yours, that is why I landed on this page.我遇到并解决了与您类似的问题,这就是我登陆此页面的原因。 My solution is to utilize the connection between pandas dataframes and xarray data arrays. I don't understand the above data you've provided.我的解决方案是利用 pandas 数据帧和 xarray 数据 arrays 之间的连接。我不理解您提供的上述数据。 But I think my logic will probably work for your case.但我认为我的逻辑可能适用于你的情况。

The first step is to prepare your actual data as a pandas dataframe, with columns for lat, lon, and all other variables.第一步是将您的实际数据准备为 pandas dataframe,其中包含纬度、经度和所有其他变量的列。

Then you can generate a 2 columns pandas dataframe based on a full combination of all "unique_lat" and "unique_lon" to represent your "full grid", using something like:然后你可以根据所有“unique_lat”和“unique_lon”的完整组合生成一个 2 列 pandas dataframe 来表示你的“完整网格”,使用类似:

# construct a full grid
def expand_grid(lat,lon):
    '''list all combinations of lats and lons using expand_grid(lat,lon)'''
    test = [(A,B) for A in lat for B in lon]
    test = np.array(test)
    test_lat = test[:,0]
    test_lon = test[:,1]
    full_grid = pd.DataFrame({'lat': test_lat, 'lon': test_lon})
    full_grid = full_grid.sort_values(by=['lat','lon'])
    full_grid = full_grid.reset_index(drop=True)
    return full_grid

Then use pandas to combine your actual data with the full grid you have created.然后使用 pandas 将您的实际数据与您创建的完整网格相结合。 The missing points will be filled with "NAN" directly.缺失的点直接用“NAN”补上。

data_onto_full_grid = pd.merge(full_grid, your_actual_data,how='left")

So now you have got a dataframe that you can actually reshape and convert to xarray, and finally save out.所以现在你有一个 dataframe,你可以实际重塑并转换为 xarray,最后保存。

target_variable_2D = data_onto_full_grid['target_variable'].values.reshape((len(out_lat),len(out_lon)))
target_variable_xr = xr.DataArray(target_variable_2D, coords=[('lat', out_lat),('lon', out_lon)])
target_variable_xr = target_variable_xr.rename("target_variable")
display(target_variable_xr)

You can use for loop to handle a few variables and merge them together into one xarray.您可以使用 for 循环来处理几个变量并将它们合并到一个 xarray 中。

# use for loop to convert all variables to xarray data arrays and combine them
var_2D = []
var_xr = []

for i in range(0,len(data_onto_full_grid.columns)):
    print(data_onto_full_grid.columns[i])
    # skip the "lat" and "lon" columns as they are going to be the dimensions in netcdf
    if (i < 2):
        var_2D.append(np.nan)
        var_xr.append(np.nan)
    else:
        var_2D.append(data_onto_full_grid.iloc[:,i].values.reshape((len(out_lat),len(out_lon))))
        var_xr.append(xr.DataArray(var_2D[i], coords=[('lat', out_lat),('lon', out_lon)]))

# provide unique names so you can merge them later
for i in range(len(var_xr)):
    if (i >= 2): 
        var_xr[i] = var_xr[i].rename(data_onto_full_grid.columns[i])

# merge xarrays for all variables into a single one        
results_xr = xr.merge(var_xr[2:len(data_onto_full_grid.columns)])

# check your results
display(results_xr)

# save out to netcdf
results_xr.to_netcdf('results.nc')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM