简体   繁体   English

xarray-在具有大量数据的多维xarray对象中查找非0的数据

[英]xarray - find data that is not 0 in a multi-dimensional xarray object with massive data efficiently

My DataArray object is as below: 我的DataArray对象如下:

print(da_criteria_1or0_hourly)

<xarray.DataArray (time: 8760, latitude: 106, longitude: 193)>
dask.array<shape=(8760, 106, 193), dtype=int32, chunksize=(744, 106, 193)>
Coordinates:
  * latitude   (latitude) float32 -39.2 -39.149525 ... -33.950478 -33.9
  * longitude  (longitude) float32 140.8 140.84792 140.89584 ... 149.95209 150.0
  * time       (time) datetime64[ns] 2017-01-01 ... 2017-12-31T23:00:00

The data can be either 0 or 1. The number of data is massive (179212080). 数据可以是0或1。数据数量庞大(179212080)。

I want to get the time, latitude and longitude that meets the criteria of "data == 1". 我想获取符合“数据== 1”标准的时间,纬度和经度。

I was trying to use the .sel function but it was extremely slow due to large number of comparisons. 我试图使用.sel函数,但是由于进行了大量比较,所以它非常慢。

for time_elem in da_criteria_1or0_hourly.coords['time'].values:
    for lat_elem in da_criteria_1or0_hourly.coords['latitude'].values:
        for lon_elem in da_criteria_1or0_hourly.coords['longitude'].values:
            val = da_criteria_1or0_hourly.sel(time=time_elem,latitude=lat_elem,longitude=lon_elem).values
            if (val == 1):
                print(time_elem, lat_elem, lon_elem, val)

Is there any more efficient way? 有没有更有效的方法?

You may want to have a look at the stack function. 您可能想看看stack功能。 It stacks the xarray with all entries below each other and you then might be able to filter for all values that do not meet your requirements. 它将xarray与所有条目堆叠在一起,然后您就可以筛选出所有不符合要求的值。 I have not tested it with a super large data-set, but it does not use a triple for-loop, so might give you some speed boost. 我尚未使用超大型数据集对其进行过测试,但未使用三重for循环,因此可能会给您带来一定的速度提升。

The code structure would look like: 代码结构如下所示:

    newArr = da_criteria_1or0_hourly.stack(z=('time','latitude','longitude'))
    newArr2 = newArr[newArr.values ==1]

Then the newArr would be your old array stacked and the newArr2 would contain only your data = 1 observations and should still contain your coordinates (although maybe in a messy format). 然后,newArr将是您堆叠的旧数组,而newArr2将仅包含您的数据= 1个观测值,并且仍应包含您的坐标(尽管可能是杂乱的格式)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM