简体   繁体   English

如何使用新维度重塑 xarray 数据

[英]How to reshape xarray data with new dimensions

I'm fairly new to the xarray library, and I am stuck in a what it seems a fairly straight-forward task.我对xarrayxarray ,而且我被困在一个看起来相当简单的任务中。 I have global climate data in a GRIB file for different 30-km grids.我在GRIB文件中有不同 30 公里网格的全球气候数据。 The data looks like this:数据如下所示:

<xarray.Dataset>
Dimensions:     (time: 736, values: 542080)
Coordinates:
    number      int64 0
  * time        (time) datetime64[ns] 2007-12-01 ... 2008-03-01T21:00:00
    step        timedelta64[ns] 00:00:00
    surface     int64 0
    latitude    (values) float64 89.78 89.78 89.78 ... -89.78 -89.78 -89.78
    longitude   (values) float64 0.0 20.0 40.0 60.0 ... 280.0 300.0 320.0 340.0
    valid_time  (time) datetime64[ns] 2007-12-01 ... 2008-03-01T21:00:00
Dimensions without coordinates: values
Data variables:
    t2m         (time, values) float32 247.30748 247.49889 ... 225.18036
Attributes:
    GRIB_edition:            1
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             European Centre for Medium-Range Weather Forecasts
    history:                 2020-01-21T09:40:59 GRIB to CDM+CF via cfgrib-0....

And that is fine.这很好。 I can access to different time instances and plot stuff, even access to the data per cell using data.t2m.data .我可以访问不同的时间实例并绘制内容,甚至可以使用data.t2m.data访问每个单元格的数据。 But, the data is indexed only by time and value , this last one is -I assume- a cell number identifier, but is not reading latitude and longitude as meaningful dimensions.但是,数据仅按timevalue索引,最后一个是 - 我假设 - 一个单元格编号标识符,但没有将latitudelongitude作为有意义的维度读取。

On the documentation, the authors use airtemp reanalysis data as an example, these data is indexed by lat , lon , and time , and that is what I want to do with my dataset.在文档中,作者以airtemp再分析数据为例,这些数据由latlontime索引,这就是我想要对我的数据集做的事情。

<xarray.Dataset>
Dimensions:  (lat: 25, lon: 53, time: 2920)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 ...
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

There is a straight forward way of doing this re-indexing in the xarray environment?xarray环境中进行这种重新索引有直接的方法吗? I guess I can just simply extract the numpy arrays and jump to pandas or something else, but I find the xarray library really powerful and useful.我想我可以只简单的提取numpy阵列,并跳转到pandas或其他什么东西,但我觉得xarray图书馆真正强大的和有用的。

One way might be to manually construct a pandas.MultiIndex from the latitude and longitude variables, assign it as the coordinate for the values dimension, and then unstack the Dataset:一种方法可能是从纬度和经度变量手动构造一个pandas.MultiIndex ,将其指定为values维度的坐标,然后取消堆叠数据集:

import pandas as pd

index = pd.MultiIndex.from_arrays(
    [ds.longitude.values, ds.latitude.values], names=['lon', 'lat']
)
ds['values'] = index
reshaped = ds.unstack('values')

For more on this, see this section under the "Reshaping and reorganizing data" section of the xarray documentation.有关更多信息,请参阅 xarray 文档的“重塑和重组数据”部分下的此部分

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM