简体   繁体   English

python:numpy:命名数组的连接

[英]python: numpy: concatenation of named arrays

Consider the following simple example: 考虑以下简单示例:

x = numpy.array([(1,2),(3,4)],dtype=[('a','<f4'),('b','<f4')])
y = numpy.array([(1,2),(3,4)],dtype=[('c','<f4'),('d','<f4')])
numpy.hstack((x,y))

One will get the following error: 一个将收到以下错误:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python33\lib\site-packages\numpy\core\shape_base.py", line 226, in vstack
    return _nx.concatenate(list(map(atleast_2d,tup)),0)
TypeError: invalid type promotion

If array had not titles it works 如果数组没有标题它可以工作

x = numpy.array([(1,2),(3,4)],dtype='<f4')
y = numpy.array([(1,2),(3,4)],dtype='<f4')
numpy.hstack((x,y))

If I remove the names from x and y it works too. 如果我从x和y中删除名称也可以。

Question: how to concatenate, vstack or hstack of titled numpy array ? 问题:如何连接,标题为numpy数组的vstack或hstack? Note: numpy.lib.recfunctions.stack_arrays doesn't work well either 注意:numpy.lib.recfunctions.stack_arrays也不能正常工作

The problem is that the types are different. 问题是类型不同。 The "title" is part of the type, and y uses different names from x , so the types are incompatible. “title”是类型的一部分, y使用x不同名称,因此类型不兼容。 If you use compatible types, everything works fine: 如果您使用兼容类型,一切正常:

>>> x = numpy.array([(1, 2), (3, 4)], dtype=[('a', '<f4'), ('b', '<f4')])
>>> y = numpy.array([(5, 6), (7, 8)], dtype=[('a', '<f4'), ('b', '<f4')])
>>> numpy.vstack((x, y))
array([[(1.0, 2.0), (3.0, 4.0)],
       [(5.0, 6.0), (7.0, 8.0)]], 
      dtype=[('a', '<f4'), ('b', '<f4')])
>>> numpy.hstack((x, y))
array([(1.0, 2.0), (3.0, 4.0), (5.0, 6.0), (7.0, 8.0)], 
      dtype=[('a', '<f4'), ('b', '<f4')])
>>> numpy.dstack((x, y))
array([[[(1.0, 2.0), (5.0, 6.0)],
        [(3.0, 4.0), (7.0, 8.0)]]], 
      dtype=[('a', '<f4'), ('b', '<f4')])

Sometimes dstack , etc. are smart enough to coerce types in a sensible way, but numpy has no way to know how to combine record arrays with different user-defined field names. 有时dstack等足够智能以合理的方式强制类型,但是numpy无法知道如何将记录数组与不同的用户定义字段名组合在一起。

If you want to concatenate the datatypes , then you have to create a new datatype. 如果要连接数据类型 ,则必须创建新的数据类型。 Don't make the mistake of thinking that the sequence of names ( x['a'] , x['b'] ...) constitutes a true dimension of the array; 不要错误地认为名称序列( x['a']x['b'] ...)构成了数组的真实维度; x and y above are 1-d arrays of blocks of memory, each of which contains two 32-bit floats that can be accessed using the names 'a' and 'b' . 上面的xy1-d内存块数组 ,每个内存块包含两个32位浮点数,可以使用名称'a''b'进行访问。 But as you can see, if you access an individual item in the array, you don't get another array as you would if it were truly a second dimension. 但正如您所看到的,如果您访问数组中的单个项目,则不会获得另一个数组,如果它真的是第二个维度。 You can see the difference here: 你可以在这里看到差异:

>>> x = numpy.array([(1, 2), (3, 4)], dtype=[('a', '<f4'), ('b', '<f4')])
>>> x[0]
(1.0, 2.0)
>>> type(x[0])
<type 'numpy.void'>

>>> z = numpy.array([(1, 2), (3, 4)])
>>> z[0]
array([1, 2])
>>> type(z[0])
<type 'numpy.ndarray'>

This is what allows record arrays to contain heterogenous data; 这是允许记录数组包含异构数据的原因; record arrays can contain both strings and ints, but the trade-off is that you don't get the full power of an ndarray at the level of individual records. 记录数组可以包含字符串和整数,但权衡的是你没有在单个记录的水平上获得ndarray的全部功能。

The upshot is that to join individual blocks of memory, you actually have to modify the dtype of the array. 结果是,要加入单个内存块,您实际上必须修改数组的dtype There are a few ways to do this but the simplest I could find involves the little-known numpy.lib.recfunctions library (which I see you've already found!): 有几种方法可以做到这一点,但我能找到的最简单的方法是使用鲜为人知的numpy.lib.recfunctions库(我看到你已经找到它了!):

>>> numpy.lib.recfunctions.rec_append_fields(x, 
                                             y.dtype.names, 
                                             [y[n] for n in y.dtype.names])
rec.array([(1.0, 2.0, 1.0, 2.0), (3.0, 4.0, 3.0, 4.0)], 
      dtype=[('a', '<f4'), ('b', '<f4'), ('c', '<f4'), ('d', '<f4')])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM