简体   繁体   English

将“常量”维度添加到 xarray 数据集

[英]Add 'constant' dimension to xarray Dataset

I have a series of monthly gridded datasets in CSV form.我有一系列 CSV 格式的每月网格数据集。 I want to read them, add a few dimensions, and then write to netcdf.我想读取它们,添加几个维度,然后写入netcdf。 I've had great experience using xarray (xray) in the past so thought I'd use if for this task.过去我在使用 xarray (xray) 方面有很好的经验,所以我想我会用 if 来完成这项任务。

I can easily get them into a 2D DataArray with something like:我可以轻松地将它们放入 2D DataArray 中,例如:

data = np.ones((360,720))
lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords =  {'lat': lats, 'lng':lngs}
da = xr.DataArray(data, coords=coords)

But when I try to add another dimension, which would convey information about time (all data is from the same year/month), things start to go sour.但是当我尝试添加另一个维度来传达有关时间的信息(所有数据都来自同一年/月)时,事情开始变得糟糕。

I've tried two ways to crack this:我尝试了两种方法来破解这个:

1) expand my input data to mxnx 1, something like: 1)将我的输入数据扩展到 mxnx 1,例如:

data = np.ones((360,720))
lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords =  {'lat': lats, 'lng':lngs}
data = data[:,:,np.newaxis]

Then I follow the same steps as above, with coords updated to contain a third dimension.然后我按照与上面相同的步骤,更新坐标以包含第三个维度。

lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords =  {'lat': lats, 'lng':lngs}
coords['time'] = pd.datetime(year, month, day))
da = xr.DataArray(data, coords=coords)
da.to_dataset(name='variable_name')

This is fine for creating a DataArray -- but when I try to convert to a dataset (so I can write to netCDF), I get an error about 'ValueError: Coordinate objects must be 1-dimensional'这对于创建 DataArray 很好——但是当我尝试转换为数据集(以便我可以写入 netCDF)时,我收到一个关于“ValueError: Coordinate objects must be 1-dimensional”的错误

2) The second approach I've tried is taking my dataarray, casting it to a dataframe, setting the index to ['lat','lng', 'time'] and then going back to a dataset with xr.Dataset.from_dataframe() . 2)我尝试过的第二种方法是将我的数据数组,将其转换为数据帧,将索引设置为 ['lat','lng', 'time'] 然后返回到带有xr.Dataset.from_dataframe()的数据集xr.Dataset.from_dataframe() . I've tried this -- but it takes 20+ min before I kill the process.我试过这个——但在我终止进程之前需要 20 多分钟。

Does anyone know how I can get a Dataset with a monthly 'time' dimension?有谁知道我如何获得具有每月“时间”维度的数据集?

Your first example is pretty close:你的第一个例子非常接近:

lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords =  {'lat': lats, 'lng': lngs}
coords['time'] = [datetime.datetime(year, month, day)]
da = xr.DataArray(data, coords=coords, dims=['lat', 'lng', 'time'])
da.to_dataset(name='variable_name')

You'll notice a few changes in my version:你会注意到我的版本有一些变化:

  1. I'm passing in a first for the 'time' coordinate instead of a scalar.我首先传入的是“时间”坐标而不是标量。 You need to pass in a list or 1d array to get a 1D coordinate variable, which is what you need if you also use 'time' as a dimension.您需要传入一个列表或一维数组以获取一维坐标变量,如果您还使用“时间”作为维度,这正是您所需要的。 That's what the error ValueError: Coordinate objects must be 1-dimensional is trying to tell you (by the way -- if you have ideas for how to make that error message more helpful, I'm all ears!).这就是错误ValueError: Coordinate objects must be 1-dimensional试图告诉你的东西(顺便说一句——如果你有关于如何使错误消息更有帮助的想法,我全神贯注!)。
  2. I'm providing a dims argument to the DataArray constructor.我为 DataArray 构造函数提供了一个dims参数。 Passing in a (non-ordered) dictionary is a little dangerous because the iteration order is not guaranteed.传入(无序)字典有点危险,因为不能保证迭代顺序。
  3. I also switched to datetime.datetime instead of pd.datetime .我也切换到datetime.datetime而不是pd.datetime The later is simply an alias for the former.后者只是前者的别名。

Another sensible approach is to use concat with a list of one item once you've added 'time' as a scalar coordinate, eg,另一种明智的方法是在将“时间”添加为标量坐标后,将concat与一个项目的列表一起使用,例如,

lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords =  {'lat': lats, 'lng': lngs, 'time': datetime.datetime(year, month, day)}
da = xr.DataArray(data, coords=coords, dims=['lat', 'lng'])
expanded_da = xr.concat([da], 'time')

This version generalizes nicely to joining together data from a bunch of days -- you simply make the list of DataArrays longer.这个版本很好地概括了将几天的数据连接在一起——您只需将 DataArrays 列表延长。 In my experience, most of the time the reason why you want the extra dimension in the first place is to be able to able to concat along it.根据我的经验,大多数情况下,您首先想要额外维度的原因是能够将其连接起来。 Length 1 dimensions are not very useful otherwise.否则,长度 1 尺寸不是很有用。

You can use .expand_dims() to add a new dimension and .assign_coords() to add coordinate values for the corresponding dimension.您可以使用.expand_dims()添加新维度,使用.assign_coords()为相应维度添加坐标值。 Below code adds new_dim dimension to ds dataset and sets a corresponding corrdinate with the list_of_values you provide.下面的代码将new_dim维度添加到ds数据集,并与您提供的list_of_values设置相应的 corrdinate。

expanded_ds = ds.expand_dims("new_dim").assign_coords(new_dim=("new_dim", [list_of_values]))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM