[英]Converting 3D xarray dataset to dataframe
I have imported a xarray dataset like this and extracted the values at coordinates defined by zones from a csv file, and a time period defined by a date range (30 days of a (lon,lat) grid with some environmental values for every coordinates).我已经导入了这样的 xarray 数据集,并从 csv 文件中提取了由区域定义的坐标处的值,以及由日期范围定义的时间段((经纬度)网格的 30 天,每个坐标都有一些环境值) .
from xgrads import open_CtlDataset
ds_Snow = open_CtlDataset(path + 'file')
ds_Snow = ds_Snow.sel(lat = list(set(zones['lat'])), lon = list(set(zones['lon'])),
time = period, method = 'nearest')
When i look for the information of ds_Snow, this is what I get :当我查找 ds_Snow 的信息时,这是我得到的:
Dimensions: (lat: 12, lon: 12, time: 30)
Coordinates:
* time (time) datetime64[ns] 2000-09-01 2000-09-02 ... 2000-09-30
* lat (lat) float32 3.414e+06 3.414e+06 3.414e+06 ... 3.414e+06 3.414e+06
* lon (lon) float32 6.873e+05 6.873e+05 6.873e+05 ... 6.873e+05 6.873e+05
Data variables:
spre (time, lat, lon) float32 dask.array<chunksize=(1, 12, 12), meta=np.ndarray>
Attributes:
title: SnowModel
undef: -9999.0 type : <class 'xarray.core.dataset.Dataset'>
I would like to make it a dataframe, respecting the initial dimensions (time, lat, lon).我想让它成为一个数据框,尊重初始维度(时间、纬度、经度)。 So I did this :
所以我这样做了:
df_Snow = ds_Snow.to_dataframe()
But here are the dimensions of the dataframe :但这里是数据框的尺寸:
print(df_Snow)
lat lon time
3414108.0 687311.625 2000-09-01 0.0
2000-09-02 0.0
2000-09-03 0.0
2000-09-04 0.0
2000-09-05 0.0
... ...
2000-09-26 0.0
2000-09-27 0.0
2000-09-28 0.0
2000-09-29 0.0
2000-09-30 0.0
[4320 rows x 1 columns]
It looks like all the data just got put in a single column.看起来所有数据都放在一个列中。 I have tried giving the dimensions orders as some documentation explained :
正如一些文档所解释的那样,我已经尝试给出尺寸订单:
df_Snow = ds_Snow.to_dataframe(dim_order = ['time', 'lat', 'lon'])
But it does not change anything, and I can't seem to find an answer in forums or the documentation.但这并没有改变任何东西,我似乎无法在论坛或文档中找到答案。 I would like to know a way to keep the array configuration in the dataframe.
我想知道一种将数组配置保留在数据框中的方法。
EDIT : I found a solution编辑:我找到了解决方案
Instead of converting the xarray, I chose to build my dataframe with pd.Series of each attributes like this :我没有转换 xarray,而是选择使用每个属性的 pd.Series 构建我的数据框,如下所示:
ds_Snow = ds_Snow.sel(lat = list(set(station_list['lat_utm'])),lon = list(set(station_list['lon_utm'])), time = Ind_Run_ERA5_Land, method = 'nearest')
time = pd.Series(ds_Snow.coords["time"].values)
lon = pd.Series(ds_Snow.coords["lon"].values)
lat = pd.Series(ds_Snow.coords["lat"].values)
spre = pd.Series(ds_Snow['spre'].values[:,0,0])
frame = { 'spre': spre, 'time': time, 'lon' : lon, 'lat' : lat}
df_Snow = pd.DataFrame(frame)
This is the expected behaviour.这是预期的行为。 From the docs :
从文档:
The DataFrame is indexed by the Cartesian product of index coordinates (in the form of a pandas.MultiIndex ).
DataFrame 由索引坐标的笛卡尔积索引(以pandas.MultiIndex的形式)。 Other coordinates are included as columns in the DataFrame.
其他坐标作为列包含在 DataFrame 中。
There is only one variable, spre
, in the dataset.数据集中只有一个变量
spre
。 The other properties, the 'coordinates' have become the index.其他属性,“坐标”已成为索引。 Since there were several coordinates (
lat
, lon
, and time
), the DataFrame has a hierarchical MultiIndex
.由于有几个坐标(
lat
、 lon
和time
),DataFrame 有一个分层的MultiIndex
。
You can either get the index data through tools like get_level_values
or, if you want to change how the DataFrame is indexed, you can use reset_index()
.您可以通过
get_level_values
等工具获取索引数据,或者,如果您想更改 DataFrame 的索引方式,可以使用reset_index()
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.