连接两个大的numpy 2D数组

Question

I have two big numpy 2D arrays. 我有两个大的numpy二维数组。 One shape is X1 (1877055, 1299), another is X2 (1877055, 1445). 一种形状是X1（1877055,1299），另一种是X2（1877055,1445）。 I then use 然后我用

X = np.hstack((X1, X2))

to concatenate the two arrays into a bigger array. 将两个数组连接成一个更大的数组。 However, the program doesn't run and exit with code -9. 但是，程序不会运行并退出代码-9。 It didn't show any error message. 它没有显示任何错误消息。

What is the problem? 问题是什么？ How can I concatenate such two big numpy 2D arrays? 如何连接这两个大的numpy 2D数组？

Answer 1

Unless there's something wrong with your NumPy build or your OS (both of which are unlikely), this is almost certainly a memory error. 除非您的NumPy构建或操作系统出现问题（两者都不太可能），否则这几乎肯定是内存错误。

For example, let's say all these values are float64 . 例如，假设所有这些值都是float64 。 So, you've already allocated at least 18GB and 20GB for these two arrays, and now you're trying to allocate another 38GB for the concatenated array. 所以，你已经为这两个数组分配了至少18GB和20GB，现在你正在尝试为连接数组分配另外38GB。 But you only have, say, 64GB of RAM plus 2GB of swap. 但是你只有64GB的RAM和2GB的掉期。 So, there's not enough room to allocate another 38GB. 所以，没有足够的空间来分配另外38GB。 On some platforms, this allocation will just fail, which hopefully NumPy would just catch and raise a MemoryError . 在某些平台上，这种分配只会失败，希望NumPy能够捕获并引发MemoryError 。 On other platforms, the allocation may succeed, but as soon as you try to actually touch all of that memory you'll segfault (see overcommit handling in linux for an example). 在其他平台上，分配可能会成功，但只要您尝试实际触摸所有内存，就会出现段错误（请参阅linux中的overcommit处理示例）。 On other platforms, the system will try to auto-expand swap, but then if you're out of disk space it'll segfault. 在其他平台上，系统将尝试自动扩展交换，但是如果你没有磁盘空间，它将是段错误。

Whatever the reason, if you can't fit X1 , X2 , and X into memory at the same time, what can you do instead? 无论是什么原因，如果你不能同时将X1 ， X2和X放入内存，你可以做些什么呢？

Just build X in the first place, and fill X1 and X2 by filling sliced views of X . 只需在第一个位置构建X ，然后通过填充X切片视图来填充X1和X2 。
Write X1 and X2 out to disk, concatenate on disk, and read them back in. 将X1和X2写入磁盘，在磁盘上连接，然后重新读取。
Send X1 and X2 to a subprocess that reads them iteratively and builds X and then continues the work. 将X1和X2发送到一个迭代读取它们的子进程并构建X然后继续工作。

Answer 2

Not an expert in numpy but, why not use numpy.concatenate() ? 不是numpy的专家但是，为什么不使用numpy.concatenate() ？

http://docs.scipy.org/doc/numpy/reference/generated/numpy.concatenate.html http://docs.scipy.org/doc/numpy/reference/generated/numpy.concatenate.html

For example: 例如：

>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[5, 6]])
>>> np.concatenate((a, b), axis=0)
array([[1, 2],
   [3, 4],
   [5, 6]])
>>> np.concatenate((a, b.T), axis=1)
array([[1, 2, 5],
   [3, 4, 6]])

连接两个大的numpy 2D数组

问题描述

2 个解决方案

解决方案1
8 已采纳 2015-05-26 01:45:57

解决方案2
-3 2015-05-26 01:07:06

连接两个大的numpy 2D数组

问题描述

2 个解决方案

解决方案1 8 已采纳 2015-05-26 01:45:57

解决方案2 -3 2015-05-26 01:07:06

解决方案1
8 已采纳 2015-05-26 01:45:57

解决方案2
-3 2015-05-26 01:07:06