简体   繁体   中英

How to replace the current dimension of an xarray object with two new ones

I am a Pandas user migrating to Xarray because I work with geospatial 3D data. Some stuff I only know how to do using Pandas and many times doesn't make any sense to convert to a Pandas DataFrame and then reconvert it to Xarray Dataset object.

What I am trying to do is to replace the current dimension of a Xarray object with two new ones, and those two new ones are currently data variables in the Xarray object .

We start from the point that the data is a Xarray object just like:

<xarray.Dataset>
Dimensions:  (index: 9)
Coordinates:
  * index    (index) int64 0 1 2 3 4 5 6 7 8
Data variables:
    Letter   (index) object 'A' 'A' 'A' 'B' 'B' 'B' 'C' 'C' 'C'
    Number   (index) int64 1 2 3 1 2 3 1 2 3
    Value1   (index) float64 0.5453 1.184 -1.177 0.8232 ... -1.253 0.3274 -1.583
    Value2   (index) float64 -0.4184 -0.3325 0.6826 ... -0.264 0.07381 0.4357

What I am trying to do is to reshape and reindexing the variables Value1 and Value2 to assign Letter and Number as its dimensions. The way I am used to doing is:

reindexed = data.to_dataframe().set_index(['Letter','Number']).to_xarray()

That returns:

<xarray.Dataset>
Dimensions:  (Letter: 3, Number: 3)
Coordinates:
  * Letter   (Letter) object 'A' 'B' 'C'
  * Number   (Number) int64 1 2 3
Data variables:
    Value1   (Letter, Number) float64 0.5453 1.184 -1.177 ... 0.3274 -1.583
    Value2   (Letter, Number) float64 -0.4184 -0.3325 0.6826 ... 0.07381 0.4357

This works very well if the data is not too big, but this seems stupid for me because it will load it into memory when I convert to DataFrame. I would like to find a way to do the same thing faster and lighter using Xarray only.

To help to reproduce the same problem, I made a code here below just to create a data similar to the one I have after reading the NetCDF file.

import numpy as np
import pandas as pd


df = pd.DataFrame()
df['Letter'] = 'A A A B B B C C C'.split()
df['Number'] = [1,2,3,1,2,3,1,2,3]
df['Value1'] = np.random.randn(9)
df['Value2'] = np.random.randn(9)
data = df.to_xarray()

You should be able to do this using the code below. You cannot remove dimensions in xarray, so you will have to replace the values of "index" with the values of Letter or Number first, and then rename the index dimension.

import numpy as np
import pandas as pd

df = pd.DataFrame()
df['Letter'] = 'A A A B B B C C C'.split()
df['Number'] = [1,2,3,1,2,3,1,2,3]
df['Value1'] = np.random.randn(9)
df['Value2'] = np.random.randn(9)
data = df.to_xarray()

(
data
 .assign_coords({"index": data.Letter.values})
 .assign_coords({"Number":data.Number.values})
 .drop("Letter")
 .rename_dims({"index":"Letter"})      
 .rename({"index":"Letter"})        
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM