简体   繁体   English

Numpy:加入结构化数组?

[英]Numpy: Joining structured arrays?

Input 输入

I have many numpy structured arrays in a list like this example: 我在列表中有许多numpy结构化数组 ,如下例所示:

import numpy

a1 = numpy.array([(1, 2), (3, 4), (5, 6)], dtype=[('x', int), ('y', int)])

a2 = numpy.array([(7,10), (8,11), (9,12)], dtype=[('z', int), ('w', float)])

arrays = [a1, a2]

Desired Output 期望的输出

What is the correct way to join them all together to create a unified structured array like the following? 将它们连接在一起以创建如下所示的统一结构化数组的正确方法是什么?

desired_result = numpy.array([(1, 2, 7, 10), (3, 4, 8, 11), (5, 6, 9, 12)],
                             dtype=[('x', int), ('y', int), ('z', int), ('w', float)])

Current Approach 目前的方法

This is what I'm currently using, but it is very slow, so I suspect there must be a more efficent way. 这是我目前正在使用的,但它非常慢,所以我怀疑必须有一个更有效的方式。

from numpy.lib.recfunctions import append_fields

def join_struct_arrays(arrays):
    for array in arrays:
        try:
            result = append_fields(result, array.dtype.names, [array[name] for name in array.dtype.names], usemask=False)
        except NameError:
            result = array

    return result

You can also use the function merge_arrays of numpy.lib.recfunctions : 您还可以使用merge_arrays的函数numpy.lib.recfunctions

import numpy.lib.recfunctions as rfn
rfn.merge_arrays(arrays, flatten = True, usemask = False)

Out[52]: 
array([(1, 2, 7, 10.0), (3, 4, 8, 11.0), (5, 6, 9, 12.0)], 
     dtype=[('x', '<i4'), ('y', '<i4'), ('z', '<i4'), ('w', '<f8')])

Here is an implementation that should be faster. 这是一个应该更快的实现。 It converts everything to arrays of numpy.uint8 and does not use any temporaries. 它将所有内容转换为numpy.uint8数组,并且不使用任何临时数据。

def join_struct_arrays(arrays):
    sizes = numpy.array([a.itemsize for a in arrays])
    offsets = numpy.r_[0, sizes.cumsum()]
    n = len(arrays[0])
    joint = numpy.empty((n, offsets[-1]), dtype=numpy.uint8)
    for a, size, offset in zip(arrays, sizes, offsets):
        joint[:,offset:offset+size] = a.view(numpy.uint8).reshape(n,size)
    dtype = sum((a.dtype.descr for a in arrays), [])
    return joint.ravel().view(dtype)

Edit : Simplified the code and avoided the unnecessary as_strided() . 编辑 :简化代码并避免不必要的as_strided()

and yet another way, a little more readable and also a lot faster I think: 而另一种方式,我认为更可读,也更快

def join_struct_arrays(arrays):
    newdtype = []
    for a in arrays:
        descr = []
        for field in a.dtype.names:
            (typ, _) = a.dtype.fields[field]
            descr.append((field, typ))
        newdtype.extend(tuple(descr))
    newrecarray = np.zeros(len(arrays[0]), dtype = newdtype)
    for a in arrays:
        for name in a.dtype.names:
            newrecarray[name] = a[name]
    return newrecarray

EDIT: with the suggestions of Sven it becomes (a little bit slower, but actually pretty readable): 编辑:随着Sven的建议它变得(有点慢,但实际上很可读):

def join_struct_arrays2(arrays):
    newdtype = sum((a.dtype.descr for a in arrays), [])
    newrecarray = np.empty(len(arrays[0]), dtype = newdtype)
    for a in arrays:
        for name in a.dtype.names:
            newrecarray[name] = a[name]
    return newrecarray

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM