简体   繁体   中英

xarray Hierarchical data organization

I have a script which calculates the magnetic field in a region of space due to a particular current distribution. The result of this calculation is stored in an xarray which has the coordinates: vec_comp , x , y , and z . vec_comp spans over the strings ['x', 'y', 'z'] to indicate the different components of the magnetic field.

I am calculating this magnetic field for a number of different current configurations (for example loops of current with different radii and distances from the region of interest). I would like to collect these magnetic field objects (xarrays) into another xarray which has coordinates indicating the tuning parameters for the current distribution. So I'll have an array where I can do something like

mag_array.sel(r=0.1, offset=0.5)

and this will return to me the 4-dimensional xarray which was calculated for those particular parameters for the current distribution.

I see that I could go ahead and add additional coordinates to the original DataArray indicating the different current parameters, however it seems clunky to me to carry around this object that may have many many coordinates. Hence the desire for a hierarchical data structure.

What is the natural way to accomplish this type of data structure?

edit: I have tried something like the following. Say B1 and B2 are two DataArrays which I would like to combine. I have tried something like:

mag_array = xr.DataArray([B1, B2], 
                      coords=[('r', [0.1, 0.2])])

However this gives an error because I guess xarray is trying to be cognizant of the structure of B1 and B2 in creating the new array so instead of expecting one specified dimension (such as r in this case) it is actually expecting specifications for all 4 of the old dimensions ( vec_comp , x , y , z ) pluse the new dimensions I have created by putting the two xarrays into an array.

If I try

mag_array = xr.DataArray([B1, B2])

Which does create a new array but if I then look at

mag_array[0]

I get back an xarray but all of the old coordinate information has been deleted.

In essence the point is that I could accomplish what I like by doing something like:

mag_array = np.zeros(2,2)
mag_array[0,0] = B1
mag_array[0,1] = B2

etc. or loop over things and then mag_array would have the behavior I desire. The problem is that it wouldn't carry along with it the coordinates and I would have to keep track of that information on my own. How can I get the best of both worlds? An array that can carry around my objects without caring about the nature of the objects as well as the ability to access the array via coordinates rather than indices?

To combine separate DataArray objects, you can use xarray.concat() , eg,

mag_array = xr.concat([B1, B2], dim=pd.Index([0.1, 0.2], name='r'))

If you assign the extra scalar coordinates (which I recommend), you can just specify the coordinate to concatenate along by name, eg,

mag_array = xr.concat([B1.assign_coords(r=0.1), B2.assign_coords(r=0.2)], dim='r')

It's also worth taking a look at helper functions like xarray.open_mfdataset() that combine the process of opening files from disk and concatenating them along shared axes, eg, xr.open_mfdataset('all/my/files/*.nc') .

open_mfdataset currently only concatenates over at most one dimensions, but there are plans to extend it to handle multiple dimensions in the future.

Finally, note that xarray (currently) does not have any version of a hierarchical data structure for non-aligned axes. Aligned axes are an intentional constraint of the data models for xarray.Dataset and xarray.DataArray . If you have sub-groups that are not aligned along common axes, you'll need to keep track of them in some separate data structure.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM