简体   繁体   English

Xarray 在 python 中将单独的日期和小时维度合并为一个时间维度

[英]Xarray merge separate day and hour dimensions into one time dimension in python

I have an xarray dataset:我有一个 xarray 数据集:

xarray 的 Jupyter 单元格输出

As you can see the dimensions are (lat, lon, step (hours), time (days)).如您所见,尺寸为(纬度、经度、步长(小时)、时间(天))。 I want to merge the hours and days into one so that the dimensions are instead (lat, lon, timestep).我想将小时和天合并为一个,以便尺寸改为(纬度,经度,时间步长)。 How do I do this?我该怎么做呢?

Creating a one-dimensional time dimension and coordinate创建一维时间维度和坐标

You can use the stack method to create a multiindex of the the time and step dimensions.您可以使用stack方法创建时间和步长维度的多索引。 As your valid_time coord already has the correct datetime dimension, you can also drop the multiindex coords and only keep the valid_time coord withe actual datetimes.由于您的valid_time坐标已经具有正确的datetime时间维度,因此您还可以删除多索引坐标并仅将valid_time坐标与实际日期时间保持一致。

import numpy as np
import xarray as xr
import pandas as pd

# Create a dummy representation of your data
ds = xr.Dataset(
    data_vars={"a": (("x", "y", "time", "step"), np.random.rand(5, 5, 3, 24))},
    coords={
        "time": pd.date_range(start="1999-12-31", periods=3, freq="d"),
        "step": pd.timedelta_range(start="1h", freq="h", periods=24),
    },
)
ds = ds.assign_coords(valid_time=ds.time + ds.step)

# Stack the time and step dims
stacked_ds = ds.stack(datetime=("time", "step"))

# Drop the multiindex if you want to keep only the valid_time coord which
# contains the combined date and time information.
# Rename vars and dims to your liking.
stacked_ds = (
    stacked_ds.drop_vars("datetime")
    .rename_dims({"datetime": "time"})
    .rename_vars({"valid_time": "time"})
)
print(stacked_ds)
<xarray.Dataset>
Dimensions:  (time: 72, x: 5, y: 5)
Coordinates:
  * time     (time) datetime64[ns] 1999-12-31T01:00:00 ... 2000-01-03
Dimensions without coordinates: x, y
Data variables:
    a        (x, y, time) float64 0.1961 0.3733 0.2227 ... 0.4929 0.7459 0.4106

Making the time coordinate an index使时间坐标成为索引

Like this we create a single time dimension with a continuous datetime series as coordinate.像这样,我们创建一个以连续日期时间序列为坐标的时间维度。 However, it is not and index .但是,它不是和index For some methods, like resample , time needs to be an index.对于某些方法,例如resample ,时间需要是一个索引。 We can fix that be explicitly setting it an index:我们可以通过显式设置索引来解决这个问题:

stacked_ds.set_index(time="time")

However, this will make 'time' a variable instead of a coordinate.但是,这将使“时间”成为变量而不是坐标。 To make it a coordinate again, we can use为了让它再次成为坐标,我们可以使用

stacked_ds.set_index(time="time").set_coords("time")

Working with Dataarrays使用数据数组

You can use stacking of dimensions on Dataarrays as well.您也可以在 Dataarrays 上使用维度堆叠。 However, they do not have rename_dims and rename_vars methods.但是,它们没有rename_dimsrename_vars方法。 Instead, you can use swap_dims and rename :相反,您可以使用swap_dimsrename

(
    ds.a.stack(datetime=("time", "step"))
    .drop_vars("datetime")
    .swap_dims({"datetime": "time"})
    .rename({"valid_time": "time"})
).set_index(time="time")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM