简体   繁体   中英

How are fieldnames supposed to be used in numpy structured arrays?

Here is a simple example where I want to represent a multidimensional array r0 describing cartesian coordinates (x, y, z):

import numpy as np

r0 = np.random.random((3, 2, 1000))
r1 = {'x': r0[0], 'y': r0[1], 'y': r0[2]}
r2 = np.array(r0, dtype=[('x', float), ('y', float), ('z', float)])

for r in [r0[0], r1['x'], r2['x']]:
    print('{:}:\t{:} bytes'.format(r.shape, r.nbytes))

This results in:

(2, 1000):      16000 bytes
(2, 1000):      16000 bytes
(3, 2, 1000):   48000 bytes

I fail to undertand what r2['x'] is doing. Intuitively I'd be able to access elements in r2 like I would do in the dict r1 .

I'm not 100% I even require numpy structured arrays for my use case but my arrays can become quite large and the dimensionality is quite high so I'd think having named structured arrays would improve code maintainability. I suspect for memory efficiency r0 is most efficient, for readability r1 might be slightly better. I was hoping r2 would be a best of both worlds data structure.

You can think of recarrays and structured arrays as shape=(nrows,) and dtype=[('fname1', ftype1, 'fname2', ftype2, 'fname3', ftype3)] . When your array dimension is > 2, your fields will use nd.arrays to retain the shape of the other dimensions.

You may or may not want a structured array for your use case (with more than 2 dimensions/axes). It's hard to say without knowing the details of your data. They are useful for 2 situations: 1) when you need an array of compound data (mix of floats, ints, strings, etc), and 2) when you want to access a "column" of data by the field/column name (using arr['x'] instead of arr[0] ). (Regarding r1 dictionary vs a structured array, I find it's easier to use NumPy arrays instead of dictionaries for this kind of data.)

Your construction of r2[] copied all of the r0[] data into each field. That's why it's larger than r0 and r1 .

@obchardon's suggestion might be what you want. I combined it with your answer (to create r3[] ), and added some statements to compare shape and dytpe for r2 and r3 . Also, I reduced the a2 dimensions and used np.arange() so it's easier to "see" which values are assigned to each field. When you do this, r3['x'] will be the same as r0[0] (and r1['x'] ). See below:

import numpy as np

a0, a1, a2 = 3, 2, 10
r0 = np.arange(a0*a1*a2).reshape(a0,a1,a2)

r1 = {'x': r0[0], 'y': r0[1], 'y': r0[2]}
r2 = np.array(r0, dtype=[('x', float), ('y', float), ('z', float)])
print(r2.shape, r2.dtype)
r3 = np.array([(r0[0],r0[1],r0[2])], dtype=[('x', float, (a1,a2)), ('y', float, (a1,a2)), ('z', float, (a1,a2))])
print(r3.shape, r3.dtype)

for r in [r0[0], r1['x'], r2['x'], r3['x']]:
    print('{:}:\t{:} bytes'.format(r.shape, r.nbytes))

Resulting shapes and dtypes:

r2.shape: (3, 2, 10) 
r2.dtype: [('x', '<f8'), ('y', '<f8'), ('z', '<f8')]
r3.shape: (1,) 
r3.dtype: [('x', '<f8', (2, 10)), ('y', '<f8', (2, 10)), ('z', '<f8', (2, 10))]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM