简体   繁体   English

将多维Xarray转换为DataFrame - Python

[英]Convert multi-dimension Xarray into DataFrame - Python

I have a big array with 4 dimensions, as follow:我有一个 4 维的大数组,如下所示:

>>> raw_data
<xarray.DataArray 'TRAC04' (time: 3, Z: 34, YC: 588, XC: 2160)>
[129548160 values with dtype=float32]
Coordinates: (12/15)
    iter       (time) int64 ...
  * time       (time) datetime64[ns] 2017-01-30T12:40:00 ... 2017-04-01T09:20:00
  * XC         (XC) float32 0.08333 0.25 0.4167 0.5833 ... 359.6 359.8 359.9
  * YC         (YC) float32 -77.98 -77.95 -77.91 -77.88 ... -30.02 -29.87 -29.72
  * Z          (Z) float32 -2.1 -6.7 -12.15 -18.55 ... -614.0 -700.0 -800.0
    rA         (YC, XC) float32 ...
    ...         ...
    maskC      (Z, YC, XC) bool ...
    maskCtrlC  (Z, YC, XC) bool ...
    rhoRef     (Z) float32 ...
    rLowC      (YC, XC) float32 ...
    maskInC    (YC, XC) bool ...
    rSurfC     (YC, XC) float32 ...
Attributes:
    standard_name:  TRAC04
    long_name:      Variable concentration
    units:          mol N/m^3

I want to transform it into a Dataframe with 5 columns, as 'XC', 'YC', 'Z', 'time', 'TRAC04'.我想将其转换为具有 5 列的 Dataframe,分别为“XC”、“YC”、“Z”、“时间”、“TRAC04”。

I tried to follow this question like this:我试着像这样关注这个问题

import itertools
data  = list(itertools.chain(*raw_data))
df = pd.DataFrame.from_records(data)

it runs it, however, I do not see creating anything in the environment.它运行它,但是,我没有看到在环境中创建任何东西。 Furthermore, if I try to look at df with pd.head(df) , it does run forever, without giving back outputs.此外,如果我尝试用pd.head(df)查看df ,它确实会永远运行,而不会返回输出。

I tried, in any case, to save df , following this question , but it runs without ending also in this case:在任何情况下,我都尝试在这个问题之后保存df ,但在这种情况下它也没有结束地运行:

np.savetxt(r'c:\data\DF_TRAC04.txt', df.values, fmt='%d')
df.to_csv(r'c:\data\DF_TRAC04.csv', header=None, index=None, sep=' ', mode='a')

I hope my answer can still help.我希望我的回答仍然可以提供帮助。

Let's first create a mock data with space variables x, y, z, and a time variable t.让我们首先创建一个带有空间变量 x、y、z 和时间变量 t 的模拟数据。

import numpy as np
import xarray as xr

val = np.arange(54).reshape(2,3,3,3)
xc = np.array([10, 20, 30])
yc = np.array([50, 60, 70])
zc = np.array([1000, 2000, 3000])
t  = np.array([0, 1])

da = xr.DataArray(
    val,
    coords={'time': t,
        'z': zc,
        'y': yc,
        'x': xc}, 
    dims=["time","z","y", "x"]
)

You will get the following DataArray :您将获得以下DataArray

<xarray.DataArray (time: 2, z: 3, y: 3, x: 3)>
array([[[[ 0,  1,  2],
         [ 3,  4,  5],
         [ 6,  7,  8]],

        [[ 9, 10, 11],
         [12, 13, 14],
         [15, 16, 17]],

        [[18, 19, 20], 
         [21, 22, 23],
         [24, 25, 26]]],


       [[[27, 28, 29],
         [30, 31, 32],
         [33, 34, 35]],

        [[36, 37, 38],
         [39, 40, 41],
         [42, 43, 44]],

        [[45, 46, 47],
         [48, 49, 50],
         [51, 52, 53]]]])
Coordinates:
  * time     (time) int64 0 1
  * z        (z) int64 1000 2000 3000
  * y        (y) int64 50 60 70
  * x        (x) int64 10 20 30

If you want to have a flat file representation of the DataArray, you can use如果你想有一个 DataArray 的平面文件表示,你可以使用

da.to_dataframe(name='value').reset_index()

and this is the result:这是结果:

    time     z   y   x  value
0      0  1000  50  10      0
1      0  1000  50  20      1
2      0  1000  50  30      2
3      0  1000  60  10      3
4      0  1000  60  20      4
...
49     1  3000  60  20     49
50     1  3000  60  30     50
51     1  3000  70  10     51
52     1  3000  70  20     52
53     1  3000  70  30     53

For saving the DataFrame to an ASCII file without the index, use: DataFrame保存到没有索引的 ASCII 文件中,请使用:

da.to_dataframe(name='value').reset_index().to_csv('dump.csv', index=False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM