简体   繁体   中英

How to create a numpy matrix with differing column data types?

Lets say I have three vectors a , b , and c :

a = np.array([1,2,3])
b = np.array([1.2, 3.2, 4.5])
c = np.array([True, True, False])

What is the simplest way to turn this into a matrix d of differing data types and column labels, as such:

d = ([[1, 1.2, True],
     [2, 3.2, True], 
     [3, 4.5, False]], 
     dtype=[('aVals','i8'), ('bVals','f4'), ('cVals','bool')])

So that I can then save this matrix to a .npy file and access the data as such after opening it;

>>> d = np.load('dFile')
>>> d['aVals']
np.array([1,2,3], dtype = [('aVals', '<i8)])

I have used a cimple column_stack to create the matrix, but I am getting a headache trying to figure out how to include the datatypes and column names, since column_stack does not accept a dtype argument, and I can't see a way to add field names and data types after the column_stack is preformed. It is worth mentioning that the vectors a , b , and c have no explicit datatypes declared upon their creation, they are as shown above.

d = np.empty(len(a), dtype=[('aVals',a.dtype), ('bVals',b.dtype), ('cVals',c.dtype)])
d['aVals'] = a
d['bVals'] = b
d['cVals'] = c

As a reusable function:

def column_stack_overflow(**kwargs):
    dtype = [(name, val.dtype) for name, val in kwargs.items()]
    arr = np.empty(len(kwargs.values()[0]), dtype=dtype)
    for name, val in kwargs.items():
        arr[name] = val
    return arr

Then:

column_stack_overflow(aVals=a, bVals=b, cVals=c)

But note kwargs is a dict so unordered, so you might not get the columns in the order you pass them.

There's a little known recarray function that constructs arrays like this. It was cited in a recent SO question:

Assigning field names to numpy array in Python 2.7.3

Allowing it to deduce everything from the input arrays:

In [19]: np.rec.fromarrays([a,b,c])
Out[19]: 
rec.array([(1, 1.2, True), (2, 3.2, True), (3, 4.5, False)], 
          dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '?')])

Specifying names

In [26]: d=np.rec.fromarrays([a,b,c],names=['avals','bvals','cVals'])
In [27]: d
Out[27]: 
rec.array([(1, 1.2, True), 
           (2, 3.2, True), 
           (3, 4.5, False)], 
          dtype=[('avals', '<i4'), ('bvals', '<f8'), ('cVals', '?')])
In [28]: d['cVals']
Out[28]: array([ True,  True, False], dtype=bool)

After creating the target array of right size and dtype it does a field by field copy. This is typical of the rec.recfunctions (even astype does this).

# populate the record array (makes a copy)
for i in range(len(arrayList)):
    _array[_names[i]] = arrayList[i]

A 2011 reference: How to make a Structured Array from multiple simple array

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM