当我尝试对这个numpy数组求和时，为什么Python会崩溃？

Question

I'm working on Ubuntu 14.04 with Python 3.4 (Numpy 1.9.2 and PIL.Image 1.1.7). 我正在使用Python 3.4（Numpy 1.9.2和PIL.Image 1.1.7）开发Ubuntu 14.04。 Here's what I do: 这是我做的：

>>> from PIL import Image
>>> import numpy as np

>>> img = Image.open("./tifs/18015.pdf_001.tif")
>>> arr = np.asarray(img)
>>> np.shape(arr)
(5847, 4133)

>>> arr.dtype
dtype('bool')

# all of the following four cases where I incrementally increase
# the number of rows to 700 are done instantly
>>> v = arr[1:100,1:100].sum(axis=0)
>>> v = arr[1:500,1:100].sum(axis=0)
>>> v = arr[1:600,1:100].sum(axis=0)
>>> v = arr[1:700,1:100].sum(axis=0)

# but suddenly this line makes Python crash
>>> v = arr[1:800,1:100].sum(axis=0)

fish: Job 1, “python3” terminated by signal SIGSEGV (Address boundary error)

Seems to me like Python runs out of memory all of a sudden. 对我而言，就像Python突然耗尽内存一样。 If that is the case - how can I allocate more memory to Python? 如果是这种情况 - 如何为Python分配更多内存？ As I can see from htop my 32GB memory capacity is not even remotely depleated. 正如我从htop看到的那样，我的32GB内存容量甚至没有被远程耗尽。

You may download the TIFF image here . 您可以在此处下载TIFF图像。

If I create an empty boolean array, set the pixels explicitely and then apply the summation - then it works: 如果我创建一个空的布尔数组，明确设置像素然后应用求和 - 然后它工作：

>>> arr = np.empty((h,w), dtype=bool)
>>> arr.setflags(write=True)

>>> for r in range(h):
>>>     for c in range(w):
>>>         arr.itemset((r,c), img.getpixel((c,r)))

>>> v=arr.sum(axis=0)

>>> v.mean()
5726.8618436970719

>>> arr.shape
(5847, 4133)

But this "workaround" is not very satisfactory as copying every pixel takes way too long - maybe there is a faster method? 但是这种“解决方法”并不十分令人满意，因为复制每个像素需要太长时间 - 也许有更快的方法？

Answer 1

I can reproduce your segfault using numpy v1.8.2/PIL v1.1.7 installed from the Ubuntu repositories. 我可以使用从Ubuntu存储库安装的numpy v1.8.2 / PIL v1.1.7重现您的segfault。

If I install numpy 1.8.2 in a virtualenv using pip (still using PIL v1.7.1 from the Ubuntu repos) then I no longer see the segfault. 如果我使用pip在virtualenv中安装numpy 1.8.2（仍然使用来自Ubuntu repos的PIL v1.7.1），那么我不再看到segfault。
If I do the opposite (installing PIL v1.1.7 using pip, and using numpy v1.8.2 from the Ubuntu repos), I still get the segfault. 如果我做相反的事情（使用pip安装PIL v1.1.7，并使用Ubuntu repos中的numpy v1.8.2），我仍然会得到段错误。

This leads me to believe that it's caused by an old bug in numpy. 这让我相信它是由numpy中的一个老bug造成的。 I haven't been able to find a good candidate in numpy's issue tracker, but I suspect that updating numpy (eg from the current source or via pip) would probably resolve the issue. 我无法在numpy的问题跟踪器中找到一个好的候选人，但我怀疑更新numpy（例如从当前来源或通过pip）可能会解决问题。

One workaround is to convert the image mode to "P" (unsigned 8-bit ints) before creating the array, then converting it back to boolean: 一种解决方法是在创建数组之前将图像模式转换为"P" （无符号8位整数），然后将其转换回布尔值：

arr2 = np.asarray(img.convert("P")).astype(np.bool)
v = arr2[1:800,1:100].sum(axis=0)

当我尝试对这个numpy数组求和时，为什么Python会崩溃？

问题描述

1 个解决方案

解决方案1
3 已采纳 2015-03-17 20:20:31

当我尝试对这个numpy数组求和时，为什么Python会崩溃？

问题描述

1 个解决方案

解决方案1 3 已采纳 2015-03-17 20:20:31

解决方案1
3 已采纳 2015-03-17 20:20:31