简体   繁体   English

如何在Numpy中就地扩展数组?

[英]How to extend an array in-place in Numpy?

Currently, I have some code like this 目前,我有一些像这样的代码

import numpy as np
ret = np.array([])
for i in range(100000):
  tmp =  get_input(i)
  ret = np.append(ret, np.zeros(len(tmp)))
  ret = np.append(ret, np.ones(fixed_length))

I think this code is not efficient as np.append needs to return a copy of the array instead of modify the ret in-place 我认为这段代码效率不高,因为np.append需要返回数组的副本而不是修改ret就地

I was wondering whether I can use the extend for a numpy array like this: 我想知道我是否可以使用这个numpy数组的extend

import numpy as np
from somewhere import np_extend
ret = np.array([])
for i in range(100000):
  tmp =  get_input(i)
  np_extend(ret, np.zeros(len(tmp)))
  np_extend(ret, np.ones(fixed_length))

So that the extend would be much more efficient. 这样extend效率会更高。 Does anyone have ideas about this? 有没有人有这个想法? Thanks! 谢谢!

Imagine a numpy array as occupying one contiguous block of memory. 想象一个numpy数组占用一个连续的内存块。 Now imagine other objects, say other numpy arrays, which are occupying the memory just to the left and right of our numpy array. 现在想象一下其他对象,比如其他numpy数组,它们占据了我们numpy数组左右两侧的内存。 There would be no room to append to or extend our numpy array. 没有空间可以附加或扩展我们的numpy数组。 The underlying data in a numpy array always occupies a contiguous block of memory. numpy数组中的基础数据总是占用连续的内存块。

So any request to append to or extend our numpy array can only be satisfied by allocating a whole new larger block of memory, copying the old data into the new block and then appending or extending. 因此,任何追加或扩展我们的numpy数组的请求只能通过分配一个全新的更大的内存块,将旧数据复制到新块然后追加或扩展来满足。

So: 所以:

  1. It will not occur in-place. 它不会就地发生。
  2. It will not be efficient. 它效率不高。

You can use the .resize() method of ndarrays. 您可以使用ndarrays的.resize()方法。 It requires that the memory is not referred to by other arrays/variables. 它要求内存不被其他数组/变量引用。

import numpy as np
ret = np.array([])
for i in range(100):
    tmp = np.random.rand(np.random.randint(1, 100))
    ret.resize(len(ret) + len(tmp)) # <- ret is not referred to by anything else,
                                    #    so this works
    ret[-len(tmp):] = tmp

The efficiency can be improved by using the usual array memory overrallocation schemes. 通过使用通常的阵列存储器叠加方案可以提高效率。

The usual way to handle this is something like this: 处理此问题的常用方法是这样的:

import numpy as np
ret = []
for i in range(100000):
  tmp =  get_input(i)
  ret.append(np.zeros(len(tmp)))
  ret.append(np.zeros(fixed_length))
ret = np.concatenate(ret)

For reasons that other answers have gotten into, it is in general impossible to extend an array without copying the data. 由于其他答案已经进入的原因,通常不能在不复制数据的情况下扩展数组。

I came across this question researching for inplace numpy insertion methods. 我遇到了这个研究inplace numpy插入方法的问题。

While reading the answers that have been given here, it occurred to me an alternative (maybe a naive one, but still an idea): why not convert the numpy array back to a list, append whatever you want to append to it and reconvert it back to an array? 在阅读这里给出的答案时,我发现了一个替代方案(可能是一个天真的,但仍然是一个想法):为什么不将numpy数组转换回列表,附加你想要附加到它的任何内容并重新转换它回到阵列?

In case you have to many insertions to be done, you could create a kind of "list cache" where you would put all insertions and the insert them in the list in one step. 如果您需要完成许多插入,您可以创建一种“列表缓存”,您可以在其中放置所有插入并将它们一步插入列表中。

Of course, if one is trying to avoid at all costs a conversion to a list and back to a numpy this is not an option. 当然,如果一个人试图不惜一切代价避免转换到列表并回到numpy,这不是一个选择。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM