简体   繁体   English

如何将 append 原语列表转换为 numpy 对象数组

[英]How to append a list of primitives to a numpy array of objects

EDIT: I've gotten a lot of useful feedback on how not to do this and how to find alternatives, but making that useful depends on idiosyncrasies of my use case that would make this question less useful to others.编辑:关于如何这样做以及如何找到替代方案,我已经获得了很多有用的反馈,但使其有用取决于我的用例的特质,这会使这个问题对其他人不太有用。 At this point, I'm not looking for alternatives to using data structured like this .在这一点上,我不是在寻找使用这种结构的数据的替代方法 I'm looking for why it seems to be impossible to do this in numpy (or how to do it if it's not impossible)我正在寻找为什么在 numpy 中似乎不可能做到这一点(或者如果不是不可能的话怎么做)

I have a numpy array, which looks like我有一个 numpy 阵列,看起来像

a = array([list([1]), list([4, 5])], dtype=object)

I want to append a list like我想 append 一个类似的列表

b = [2, 3, 4]

To get a result like得到类似的结果

array([list([1]), list([4, 5]), list([2, 3, 4])], dtype=object)

However, every method I've tried has produced:但是,我尝试过的每种方法都产生了:

array([list([1]), list([4, 5]), 2, 3, 4], dtype=object)

I've tried vstack, concatenate, and append, as well as wrapping things in lists or ndarrays.我尝试过 vstack、concatenate 和 append,以及将内容包装在列表或 ndarray 中。

Why am I doing this?我为什么要这样做? Basically, I have a lot of data in an ndarray that's going to get fed into sklearn.基本上,我在 ndarray 中有很多数据将被输入 sklearn。 I want to have a 3d ndarray (data sets x data points x features) but incoming data is bad and certain things have different lengths, so the innermost dimension has to be lists.我想要一个 3d ndarray(数据集 x 数据点 x 特征),但传入的数据很糟糕,某些东西的长度不同,所以最里面的维度必须是列表。 I'm trying to append a derived feature, which keeps failing.我正在尝试 append 派生功能,但一直失败。 I've managed to reorder the operations to avoid needing to do this appending, but I still want to know how to do it.我已经设法重新排序操作以避免需要执行此附加操作,但我仍然想知道该怎么做。 This seems like an odd failure for numpy.这似乎是 numpy 的奇怪故障。 edit: In short, the outer array must be an ndarray because it's actually 2d, and complex slicing is frequently used, while the append operation occurs very few times.编辑:简而言之,外部数组必须是ndarray ,因为它实际上是2d的,并且经常使用复杂的切片,而append操作很少发生。

Appending to an array in the first place is an expensive and generally smelly operation. 首先,附加到阵列是昂贵且通常有臭味的操作。 The thing is that the contents of the array may be mutable, but the address of the underlying buffer is not. 问题是数组的内容可能是可变的,但底层缓冲区的地址不是可变的。 Every time you append an element, the whole thing gets reallocated and copied. 每次添加元素时,整个内容都会重新分配并复制。 As far as I'm aware, there isn't even an attempt at amortization, as with list . 据我所知,甚至没有像list那样进行摊销。

If you are up for a slightly different approach, I would recommend maintaining your data in a list as you have now. 如果您希望采用稍微不同的方法,则建议您像现在一样将数据保存在list中。 You just transform your list into an array whenever you actually need the array. 只要您实际需要数组,就可以将列表转换为数组。 Remember that this is cheaper than reallocating to a new array every time, and you probably won't have to do it often compared to the number of appends: 请记住,这比每次重新分配给新数组都便宜,并且与附加数相比,您可能不必经常这样做:

stack = [[1], [4, 5]]
a = np.array(stack, dtype=np.object)
# do stuff to the array

...

stack.append([2, 3, 4])
a = np.array(stack, dtype=np.object)

Update Now that I Understand Your Question 立即更新,我了解您的问题

If your goal is just to figure out how to append an element to an object array without having the fact that it is a list get in your way, you have to first create an array or element that is empty. 如果您的目标只是弄清楚如何将元素添加到对象数组而又不妨碍列表的事实,则必须首先创建一个空的数组或元素。 Rather than trying to coerce the type with fake elements as some of the comments suggest, I recommend just creating empty elements and setting them to your list explicitly. 我建议不要创建空元素并将其明确设置为列表,而不是像某些注释所建议的那样用假元素来强制类型。 You can wrap the operation in a function if you want to have a clean interface. 如果您想要一个干净的界面,可以将该操作包装在一个函数中。

Here is an example: 这是一个例子:

b = [2, 3, 4]
c = np.empty(1, dtype=np.object)
c[0] = b
a = np.concatenate((a, c))

OR 要么

a = np.append(a, c)

Of course this is not as clean as np.array([b], dtype=np.object) , but that's just an artifact of how numpy processes arrays. 当然,这不如np.array([b], dtype=np.object) ,但这只是numpy如何处理数组的np.array([b], dtype=np.object) The reason you pretty much have to do it like this is that numpy treats anything that is a list or tuple as a special item that you want to convert into an array at the outer level. 之所以必须这样做,是因为numpy将列表或元组中的任何内容都视为特殊项目,您希望将其转换为外层数组。

Time passed, but maybe someone will make use of that (Python 3.9, NumPy 1.23).时间过去了,但也许有人会利用它(Python 3.9,Z3B7F949B2343F9E5390​​E29F6EF5E1778Z 1.23)。

I've had the same problem.我有同样的问题。 The easiest solution I've found is to append one element to an ndarray (as a placeholder, in other words as an array extender), then assign the list to the last element of the extended array.我发现的最简单的解决方案是将 append 一个元素添加到 ndarray (作为占位符,换句话说,作为数组扩展器),然后将列表分配给扩展数组的最后一个元素。

a_list = [1, 2, 3]
an_array = np.ones(10, dtype=object)
an_array = np.append(an_array, 0)
an_array[-1] = a_list

I think it has the smallest performance impact because the temporary array isn't created.我认为它对性能的影响最小,因为没有创建临时数组。

EDIT: I saw that JE_Muc's solution is almost the same as mine.编辑:我看到 JE_Muc 的解决方案与我的几乎相同。

If you really must have an np.ndarray with dtype=object , you can do this: 如果确实必须具有np.ndarray dtype=objectnp.ndarray ,则可以执行以下操作:

a = np.array([list([1]), list([4, 5])], dtype=object)
b = [2, 3, 4]
a = np.hstack((a, np.empty(1)))
a[-1] = b

(Or of course remove np. in your case where you fully imported numpy.) (或者,如果您完全导入了numpy,则当然请删除np. 。)
But I recommend not using np.ndarray s of dtype=object . 但是我建议不要使用np.ndarray dtype=object np.ndarray Instead use list s with: 而是使用list s:

a = [[list([1]), list([4, 5])]]
b = [2, 3, 4]
a.append(b)

Now if you really want to have a as an np.ndarray , you can then do the following: 现在,如果您真的想将a作为np.ndarray ,则可以执行以下操作:

a = np.array(a)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM