[英]Unable to allocate array with shape and data type
I'm facing an issue with allocating huge arrays in numpy on Ubuntu 18 while not facing the same issue on MacOS.我在 Ubuntu 18 上的 numpy 中分配巨大的 arrays 时面临一个问题,而在 MacOS 上没有遇到同样的问题。
I am trying to allocate memory for a numpy array with shape (156816, 36, 53806)
with我正在尝试为形状为
(156816, 36, 53806)
的 numpy 数组分配 memory
np.zeros((156816, 36, 53806), dtype='uint8')
and while I'm getting an error on Ubuntu OS当我在 Ubuntu 操作系统上遇到错误时
>>> import numpy as np
>>> np.zeros((156816, 36, 53806), dtype='uint8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
numpy.core._exceptions.MemoryError: Unable to allocate array with shape (156816, 36, 53806) and data type uint8
I'm not getting it on MacOS:我在 MacOS 上没有得到它:
>>> import numpy as np
>>> np.zeros((156816, 36, 53806), dtype='uint8')
array([[[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]],
[[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]],
[[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]],
...,
[[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]],
[[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]],
[[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]]], dtype=uint8)
I've read somewhere that np.zeros
shouldn't be really allocating the whole memory needed for the array, but only for the non-zero elements.我在某处
np.zeros
不应该真正分配数组所需的整个 memory,而只是分配非零元素。 Even though the Ubuntu machine has 64gb of memory, while my MacBook Pro has only 16gb.即使 Ubuntu 机器有 64gb 的 memory,而我的 MacBook Pro 只有 16gb。
versions:版本:
Ubuntu
os -> ubuntu mate 18
python -> 3.6.8
numpy -> 1.17.0
mac
os -> 10.14.6
python -> 3.6.4
numpy -> 1.17.0
PS: also failed on Google Colab PS:在 Google Colab 上也失败了
This is likely due to your system's overcommit handling mode.这可能是由于您系统的过量使用处理模式造成的。
In the default mode, 0
,在默认模式下,
0
,
Heuristic overcommit handling.
启发式过量使用处理。 Obvious overcommits of address space are refused.
明显的地址空间过度使用被拒绝。 Used for a typical system.
用于典型系统。 It ensures a seriously wild allocation fails while allowing overcommit to reduce swap usage.
它确保严重的疯狂分配失败,同时允许过度使用以减少交换使用。 The root is allowed to allocate slightly more memory in this mode.
在这种模式下,允许根分配更多的内存。 This is the default.
这是默认设置。
The exact heuristic used is not well explained here, but this is discussed more on Linux over commit heuristic and on this page .此处没有很好地解释所使用的确切启发式方法,但在Linux 上通过提交启发式方法和此页面上对此进行了更多讨论。
You can check your current overcommit mode by running您可以通过运行来检查当前的过量使用模式
$ cat /proc/sys/vm/overcommit_memory
0
In this case, you're allocating在这种情况下,您正在分配
>>> 156816 * 36 * 53806 / 1024.0**3
282.8939827680588
~282 GB and the kernel is saying well obviously there's no way I'm going to be able to commit that many physical pages to this, and it refuses the allocation. ~282 GB 并且内核说得很好,显然我无法将那么多物理页面提交给它,并且它拒绝分配。
If (as root) you run:如果(以 root 身份)运行:
$ echo 1 > /proc/sys/vm/overcommit_memory
This will enable the "always overcommit" mode, and you'll find that indeed the system will allow you to make the allocation no matter how large it is (within 64-bit memory addressing at least).这将启用“始终过量使用”模式,您会发现系统确实允许您进行分配,无论它有多大(至少在 64 位内存寻址范围内)。
I tested this myself on a machine with 32 GB of RAM.我自己在具有 32 GB RAM 的机器上对此进行了测试。 With overcommit mode
0
I also got a MemoryError
, but after changing it back to 1
it works:使用 overcommit mode
0
我也得到了一个MemoryError
,但是在将它改回1
它可以工作:
>>> import numpy as np
>>> a = np.zeros((156816, 36, 53806), dtype='uint8')
>>> a.nbytes
303755101056
You can then go ahead and write to any location within the array, and the system will only allocate physical pages when you explicitly write to that page.然后,您可以继续写入阵列中的任何位置,系统只会在您明确写入该页面时分配物理页面。 So you can use this, with care, for sparse arrays.
所以你可以小心地将它用于稀疏数组。
I had this same problem on Window's and came across this solution.我在 Window 上遇到了同样的问题,并遇到了这个解决方案。 So if someone comes across this problem in Windows the solution for me was to increase the pagefile size, as it was a Memory overcommitment problem for me too.
因此,如果有人在 Windows 中遇到此问题,我的解决方案是增加页面文件大小,因为这对我来说也是内存过量使用的问题。
Windows 8视窗 8
Windows 10视窗 10
Note: I did not have the enough memory on my system for the ~282GB in this example but for my particular case this worked.注意:在本示例中,我的系统上没有足够的内存用于 ~282GB,但对于我的特殊情况,这是有效的。
EDIT编辑
From here the suggested recommendations for page file size:从这里对页面文件大小的建议建议:
There is a formula for calculating the correct pagefile size.
有一个计算正确页面文件大小的公式。 Initial size is one and a half (1.5) x the amount of total system memory.
初始大小是系统总内存量的二分之一 (1.5) x。 Maximum size is three (3) x the initial size.
最大尺寸为三 (3) x 初始尺寸。 So let's say you have 4 GB (1 GB = 1,024 MB x 4 = 4,096 MB) of memory.
因此,假设您有 4 GB(1 GB = 1,024 MB x 4 = 4,096 MB)的内存。 The initial size would be 1.5 x 4,096 = 6,144 MB and the maximum size would be 3 x 6,144 = 18,432 MB.
初始大小为 1.5 x 4,096 = 6,144 MB,最大大小为 3 x 6,144 = 18,432 MB。
Some things to keep in mind from here :从这里要记住的一些事情:
However, this does not take into consideration other important factors and system settings that may be unique to your computer.
但是,这并未考虑您的计算机可能独有的其他重要因素和系统设置。 Again, let Windows choose what to use instead of relying on some arbitrary formula that worked on a different computer.
同样,让 Windows 选择要使用的内容,而不是依赖某些在不同计算机上运行的任意公式。
Also:还:
Increasing page file size may help prevent instabilities and crashing in Windows.
增加页面文件大小可能有助于防止 Windows 中的不稳定和崩溃。 However, a hard drive read/write times are much slower than what they would be if the data were in your computer memory.
但是,硬盘驱动器的读/写时间比数据在计算机内存中时慢得多。 Having a larger page file is going to add extra work for your hard drive, causing everything else to run slower.
拥有更大的页面文件将为您的硬盘增加额外的工作,导致其他一切运行速度变慢。 Page file size should only be increased when encountering out-of-memory errors, and only as a temporary fix.
仅在遇到内存不足错误时才应增加页面文件大小,并且只能作为临时修复。 A better solution is to adding more memory to the computer.
更好的解决方案是向计算机添加更多内存。
I came across this problem on Windows too.我在 Windows 上也遇到了这个问题。 The solution for me was to switch from a 32-bit to a 64-bit version of Python .
我的解决方案是从 32 位版本切换到 64 位版本的 Python 。 Indeed, a 32-bit software, like a 32-bit CPU, can adress a maximum of 4 GB of RAM (2^32).
事实上,32 位软件,如 32 位 CPU,最多可以处理 4 GB的 RAM (2^32)。 So if you have more than 4 GB of RAM, a 32-bit version cannot take advantage of it.
因此,如果您拥有超过 4 GB 的 RAM,则 32 位版本无法利用它。
With a 64-bit version of Python (the one labeled x86-64 in the download page), the issue disappears.使用 64 位版本的 Python(下载页面中标记为x86-64 的版本),问题就消失了。
You can check which version you have by entering the interpreter.您可以通过输入解释器来检查您的版本。 I, with a 64-bit version, now have:
Python 3.7.5rc1 (tags/v3.7.5rc1:4082f600a5, Oct 1 2019, 20:28:14) [MSC v.1916 64 bit (AMD64)]
, where [MSC v.1916 64 bit (AMD64)] means "64-bit Python".我,使用 64 位版本,现在有:
Python 3.7.5rc1 (tags/v3.7.5rc1:4082f600a5, Oct 1 2019, 20:28:14) [MSC v.1916 64 bit (AMD64)]
,其中 [ MSC v.1916 64 位 (AMD64)] 表示“64 位 Python”。
Sources :来源:
In my case, adding a dtype attribute changed dtype of the array to a smaller type(from float64 to uint8), decreasing array size enough to not throw MemoryError in Windows(64 bit).在我的情况下,添加 dtype 属性将数组的 dtype 更改为较小的类型(从 float64 到 uint8),减少数组大小足以在 Windows(64 位)中不抛出 MemoryError。
from从
mask = np.zeros(edges.shape)
to到
mask = np.zeros(edges.shape,dtype='uint8')
Sometimes, this error pops up because of the kernel has reached its limit.有时,由于内核已达到极限,会弹出此错误。 Try to restart the kernel redo the necessary steps.
尝试重新启动内核重做必要的步骤。
change the data type to another one which uses less memory works.将数据类型更改为另一种使用较少内存的工作。 For me, I change the data type to numpy.uint8:
对我来说,我将数据类型更改为 numpy.uint8:
data['label'] = data['label'].astype(np.uint8)
Not the most comprehensive answer, however thought will share it anyway. 不是最全面的答案,但是无论如何,思想都会共享它。
I had the same error when using Jupyter Notebook running locally and Chrome browser. 使用本地运行的Jupyter Notebook和Chrome浏览器时出现相同的错误。 Closed and quite the jupyter notebook, closed Chrome and restarted the Jupyter Notebook using Firefox and that solved my issue.
关闭并完全关闭Jupyter Notebook,关闭Chrome,然后使用Firefox重新启动Jupyter Notebook,这解决了我的问题。
I have had the same problem on a 64 bit Windows 10, Python, and Pycharm.我在 64 位 Windows 10、Python 和 Pycharm 上遇到了同样的问题。 The solution was very simple:
解决方案非常简单:
My hard drive was full, there was no disk space for memory paging.我的硬盘已满,没有用于内存分页的磁盘空间。 Removed a few files and the problem was solved.
删除了几个文件,问题就解决了。
I faced the same issue running pandas in a docker contain on EC2.我在 EC2 上包含的 docker 中运行 pandas 时遇到了同样的问题。 I tried the above solution of allowing overcommit memory allocation via
sysctl -w vm.overcommit_memory=1
(more info on this here ), however this still didn't solve the issue.我尝试了上述通过
sysctl -w vm.overcommit_memory=1
允许过度使用 memory 分配的解决方案(更多信息在这里),但这仍然没有解决问题。
Rather than digging deeper into the memory allocation internals of Ubuntu/EC2, I started looking at options to parallelise the DataFrame, and discovered that using dask worked in my case:我没有深入研究 Ubuntu/EC2 的 memory 分配内部结构,而是开始研究并行化 DataFrame 的选项,并发现使用dask在我的案例中有效:
import dask.dataframe as dd
df = dd.read_csv('path_to_large_file.csv')
...
Your mileage may vary, and note that the dask API is very similar but not a complete like to like for pandas/numpy (eg you may need to make some code changes in places depending on what you're doing with the data).您的里程可能会有所不同,请注意 dask API 非常相似,但不是完全喜欢 pandas/numpy(例如,您可能需要根据您对数据的处理方式在某些地方进行一些代码更改)。
I was having this issue with numpy by trying to have image sizes of 600x600 (360K) , I decided to reduce to 224x224 (~50k) , a reduction in memory usage by a factor of 7.我在尝试使用 600x600 (360K) 的图像尺寸时遇到了 numpy 的这个问题,我决定减少到 224x224 (~50k) ,memory 的使用量减少了 7 倍。
X_set = np.array(X_set).reshape(-1, 600 * 600 * 3)
is now就是现在
X_set = np.array(X_set).reshape(-1, 224 * 224 * 3)
hope this helps希望这可以帮助
from pandas_profiling import ProfileReport prof = ProfileReport(df,minimal = True) prof.to_file(output_file = 'output.html') from pandas_profiling import ProfileReport prof = ProfileReport(df,minimal = True) prof.to_file(output_file = 'output.html')
worked for me为我工作
i had the same problem but reason was simple wrong decimal delimiter setup "," instead of "."我有同样的问题,但原因很简单,十进制分隔符设置错误“,”而不是“。” enjoy
请享用
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.