简体   繁体   English

从坐标标签计算 xarray 数据数组

[英]Calculate xarray dataarray from coordinate labels

I have an DataArray with two variables (meteorological data) over time,y,x coordinates.我有一个 DataArray,其中有两个变量(气象数据)随时间,y,x 坐标。 The x and y coordinates are in a projected coordinate system (EPSG:3035) and aligned so that each cell covers pretty much exactly a standard cell of the 1km LAEA reference grid x 和 y 坐标位于投影坐标系 (EPSG:3035) 中并对齐,以便每个单元格几乎完全覆盖1 公里 LAEA 参考网格的标准单元格

I want to prepare the data for further use in Pandas and/or database tables, so I want to add the LAEA Gridcell Number/Label which can be calculated from x and y directly via the following (pseudo) function我想准备数据以在 Pandas 和/或数据库表中进一步使用,所以我想添加 LAEA Gridcell Number/Label,可以通过以下(伪)function 直接从 x 和 y 计算

def func(cell):
    return r'1km{}{}'.format(int(cell['y']/1000), int(cell['x']/1000))      # e.g. 1kmN2782E4850

But as far as I can see there seems to be no possibility, to apply this function to a DataArray or DataSet in a way so that I have access to these coordinate variables (at least .apply_ufunc() wasn't really working for me.但据我所知,似乎不可能将这个 function 应用于 DataArray 或 DataSet 以便我可以访问这些坐标变量(至少.apply_ufunc()并没有真正为我工作。

I am able to calc this on Pandas later on, but some of my datasets consists of 60 up to 120 Mio.稍后我可以在 Pandas 上计算这个,但我的一些数据集包含 60 到 120 个 Mio。 Cells/Rows/datasets and pandas (even with Numba) seems to have troubles with that amount.单元格/行/数据集和 pandas(即使使用 Numba)似乎在这个数量上存在问题。 On the xarray I am able to process this on 32 Cores via Dask.在 xarray 上,我可以通过 Dask 在 32 个核心上处理它。

I would be grateful on any advice on how to get this working.我将不胜感激有关如何使其工作的任何建议。

EDIT: Some more insights of the data I`m working with:编辑:对我正在使用的数据的更多见解:

This one is quite the largest with 500 Mio cells, but I am able to downsample this to squarekilometer resolution which ends up with about 160 Mio.这是最大的一个,有 500 个 Mio 单元,但我可以将其下采样到平方公里分辨率,最终得到大约 160 个 Mio。 cells细胞

Xarray "vci" 具有几十年的每月植被状况指数

If the dataset is small enough, I am able to export it as a pandas dataframe and calculate there, but thats slow and not very robust as the kernel is crashing quite often如果数据集足够小,我可以将其导出为 pandas dataframe 并在那里计算,但这很慢而且不是很健壮,因为 kernel 经常崩溃

熊猫中的相同计算

This is how you can apply your function:这是您可以应用 function 的方法:

import xarray as xr

# ufunc
def func(x, y):
    #print(y)
     return r'1km{}{}'.format(int(y), int(x))

# test data
ds = xr.tutorial.load_dataset("rasm")

xr.apply_ufunc(
    func, 
    ds.x,
    ds.y,
    vectorize=True,
)

Note that you don't have to list input_core_dims in your case.请注意,您不必在您的情况下列出input_core_dims

Also, since your function isn't vectorized, you need to set vectorized=True :此外,由于您的 function 未矢量化,因此您需要设置vectorized=True

vectorize: bool, optional If True, then assume func only takes arrays defined over core dimensions as input and vectorize it automatically with:py:func: numpy.vectorize . vectorize: bool, optional 如果为真,则假设func仅将在核心维度上定义的 arrays 作为输入,并使用:py:func: numpy.vectorize自动对其进行矢量化。 This option exists for convenience, but is almost always slower than supplying a pre-vectorized function.存在此选项是为了方便,但几乎总是比提供预矢量化的 function 慢。 Using this option requires NumPy version 1.12 or newer.使用此选项需要 NumPy 版本 1.12 或更高版本。

Using vectorized might not be the most performant option as it is essentially just looping, but if you have your data in chunks and use dask , it might be good enough.使用vectorized可能不是性能最高的选项,因为它本质上只是循环,但如果您将数据分块并使用dask ,它可能就足够了。

If not, you could look into creating a vectorized function with eg numba that would speed things up surely.如果没有,您可以考虑使用例如 numba 创建一个矢量化的 function,这肯定会加快速度。

More info can be found in the xarray tutorial on applying ufuncs更多信息可以在关于应用 ufuncs 的 xarray 教程中找到

You can use apply_ufunc in an unvectorised way :您可以以非矢量化方式使用apply_ufunc

def func(x, y):
    return f'1km{int(y/1000)}{int(x/1000)}'  # e.g. 1kmN2782E4850

xr.apply_ufunc(
    func, # first the function
    x.x,  # now arguments in the order expected by 'func'
    x.y
    )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM