简体   繁体   English

将元组 (dtype=object) 的 np.ndarray 转换为 dtype=int 的数组

[英]Convert np.ndarray of tuples (dtype=object) into array with dtype=int

I need to convert np arrays (short) of tuples to np arrays of ints.我需要将元组的 np arrays (短)转换为整数的 np arrays 。

The most obvious method doesn't work:最明显的方法不起作用:

# array_of_tuples is given, this is just an example:
array_of_tuples = np.zeros(2, dtype=object)
array_of_tuples[0] = 1,2
array_of_tuples[1] = 2,3

np.array(array_of_tuples, dtype=int)

ValueError: setting an array element with a sequence.

It looks like placing the tuples into a pre-allocated buffer of fixed size and dtype is the way to go.看起来将元组放入固定大小和 dtype 的预分配缓冲区是 go 的方式。 It seems to avoid a lot of the overhead associated with computing sizes, raggedness and dtype.它似乎避免了与计算大小、粗糙度和 dtype 相关的大量开销。

Here are some slower alternatives and a benchmark:以下是一些较慢的替代方案和基准:

  • You can cheat and create a dtype with the requisite number of fields, since numpy supports conversion of tuples to custom dtypes:您可以作弊并创建具有所需字段数量的 dtype,因为 numpy 支持将元组转换为自定义 dtype:

     dt = np.dtype([('', int) for _ in range(len(array_of_tuples[0]))]) res = np.empty((len(array_of_tuples), len(array_of_tuples[0])), int) res.view(dt).ravel()[:] = array_of_tuples
  • You can stack the array:您可以堆叠数组:

     np.stack(array_of_tuples, axis=0)

    Unfortunately, this is even slower than the other proposed methods.不幸的是,这甚至比其他提出的方法还要慢。

  • Pre-allocation does not help much:预分配没有多大帮助:

     res = np.empty((len(array_of_tuples), len(array_of_tuples[0])), int) np.stack(array_of_tuples, out=res, axis=0)
  • Trying to cheat using np.concatenate , which allows you to specify the output dtype does not help much either:尝试使用np.concatenate作弊,它允许您指定 output dtype 也无济于事:

     np.concatenate(array_of_tuples, dtype=int).reshape(len(array_of_tuples), len(array_of_tuples[0]))
  • And neither does pre-allocating the array:也没有预先分配数组:

     res = np.empty((len(array_of_tuples), len(array_of_tuples[0])), int) np.concatenate(array_of_tuples, out=res.ravel())
  • You can also try to do the concatenation in python space, which is slow too:您也可以尝试在 python 空间中进行连接,这也很慢:

     np.array(sum(array_of_tuples, start=()), dtype=int).reshape(len(array_of_tuples), len(array_of_tuples[0]))

    OR或者

     np.reshape(np.sum(array_of_tuples), (len(array_of_tuples), len(array_of_tuples[0])))
array_of_tuples = np.empty(100, dtype=object)
for i in range(len(array_of_tuples)):
    array_of_tuples[i] = tuple(range(i, i + 100))

%%timeit
res = np.empty((len(array_of_tuples), len(array_of_tuples[0])), int)
for i, res[i] in enumerate(array_of_tuples):
    pass
305 µs ± 8.55 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

dt = np.dtype([('', 'int',) for _ in range(100)])
%%timeit
res = np.empty((100, 100), int)
res.view(dt).ravel()[:] = array_of_tuples
334 µs ± 5.59 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.array(array_of_tuples.tolist())
478 µs ± 12.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
res = np.empty((100, 100), int)
np.concatenate(array_of_tuples, out=res.ravel())
500 µs ± 2.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.concatenate(array_of_tuples, dtype=int).reshape(100, 100)
504 µs ± 7.72 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
res = np.empty((100, 100), int)
np.stack(array_of_tuples, out=res, axis=0)
557 µs ± 25.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.stack(array_of_tuples, axis=0)
577 µs ± 6.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.array(sum(array_of_tuples, start=()), dtype=int).reshape(100, 100)
1.06 ms ± 11.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.reshape(np.sum(array_of_tuples), (100, 100))
1.26 ms ± 24.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM