简体   繁体   English

numpy:使用不同的列类型快速创建recarray

[英]numpy: creating recarray fast with different column types

I am trying to create a recarray from a series of numpy arrays with column names and mixed variable types. 我正在尝试使用一系列具有列名和混合变量类型的numpy数组创建一个recarray。

The following works but is slow: 以下工作有效,但速度较慢:

    import numpy as np
    a = np.array([1,2,3,4], dtype=np.int)
    b = np.array([6,6,6,6], dtype=np.int)
    c = np.array([-1.,-2.-1.,-1.], dtype=np.float32)
    d = np.array(list(zip(a,b,c,d)),dtype = [('a',np.int),('b',np.int),('c',np.float32)])
    d = d.view(np.recarray())

I think there should be a way to do this with np.stack((a,b,c), axis=-1), which is faster than the list(zip()) method. 我认为应该有一种方法可以使用np.stack((a,b,c),axis = -1)来完成,该方法比list(zip())方法要快。 However, there does not seem to be a trivial way to do the stacking an preserving column types. 但是,似乎没有简单的方法来进行堆叠保留列类型。 This link does seem to show how to do it, but its pretty clunky and I hope there is a better way. 该链接似乎显示了如何执行此操作,但是它很笨拙,我希望有更好的方法。

Thanks for the help! 谢谢您的帮助!

np.rec.fromarrays is probably what you want: np.rec.fromarrays可能就是您想要的:

>>> np.rec.fromarrays([a, b, c], names=['a', 'b', 'c'])
rec.array([(1, 6, -1.), (2, 6, -2.), (3, 6, -1.), (4, 6, -1.)],
          dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<f4')])

Here's the field by field approach that I commented on: 这是我评论过的逐项研究方法:

In [308]:     a = np.array([1,2,3,4], dtype=np.int)
     ...:     b = np.array([6,6,6,6], dtype=np.int)
     ...:     c = np.array([-1.,-2.,-1.,-1.], dtype=np.float32)
     ...:     dt = np.dtype([('a',np.int),('b',np.int),('c',np.float32)])
     ...: 
     ...: 

(I had to correct your copy-n-pasted c ). (我必须更正您粘贴n的副本c )。

In [309]: arr = np.zeros(a.shape, dtype=dt)
In [310]: for name, x in zip(dt.names, [a,b,c]):
     ...:     arr[name] = x
     ...:     
In [311]: arr
Out[311]: 
array([(1, 6, -1.), (2, 6, -2.), (3, 6, -1.), (4, 6, -1.)],
      dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<f4')])

Since typically the array will have many more records (rows) than fields this should be faster than the list of tuples approach. 由于通常数组比字段具有更多的记录(行),因此它应该比元组列表方法快。 In this case it probably is comprable in speed. 在这种情况下,它的速度可能是可比的。

In [312]: np.array(list(zip(a,b,c)), dtype=dt)
Out[312]: 
array([(1, 6, -1.), (2, 6, -2.), (3, 6, -1.), (4, 6, -1.)],
      dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<f4')])

rec.fromarrays , after some setup to determine the dtype, does: rec.fromarrays一些设置以确定rec.fromarrays之后, rec.fromarrays执行以下操作:

_array = recarray(shape, descr)
# populate the record array (makes a copy)
for i in range(len(arrayList)):
    _array[_names[i]] = arrayList[i]

The only way to use stack is to create recarrays first: 使用stack的唯一方法是首先创建Recarray:

In [315]: [np.rec.fromarrays((i,j,k), dtype=dt) for i,j,k in zip(a,b,c)]
Out[315]: 
[rec.array((1, 6, -1.),
           dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<f4')]),
 rec.array((2, 6, -2.),
           dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<f4')]),
 rec.array((3, 6, -1.),
           dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<f4')]),
 rec.array((4, 6, -1.),
           dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<f4')])]
In [316]: np.stack(_)
Out[316]: 
array([(1, 6, -1.), (2, 6, -2.), (3, 6, -1.), (4, 6, -1.)],
      dtype=(numpy.record, [('a', '<i8'), ('b', '<i8'), ('c', '<f4')]))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM