无法分配具有形状和数据类型的数组

Question

I'm facing an issue with allocating huge arrays in numpy on Ubuntu 18 while not facing the same issue on MacOS.我在 Ubuntu 18 上的 numpy 中分配巨大的 arrays 时面临一个问题，而在 MacOS 上没有遇到同样的问题。

I am trying to allocate memory for a numpy array with shape (156816, 36, 53806) with我正在尝试为形状为(156816, 36, 53806)的 numpy 数组分配 memory

np.zeros((156816, 36, 53806), dtype='uint8')

and while I'm getting an error on Ubuntu OS当我在 Ubuntu 操作系统上遇到错误时

>>> import numpy as np
>>> np.zeros((156816, 36, 53806), dtype='uint8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
numpy.core._exceptions.MemoryError: Unable to allocate array with shape (156816, 36, 53806) and data type uint8

I'm not getting it on MacOS:我在 MacOS 上没有得到它：

>>> import numpy as np 
>>> np.zeros((156816, 36, 53806), dtype='uint8')
array([[[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       ...,

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]]], dtype=uint8)

I've read somewhere that np.zeros shouldn't be really allocating the whole memory needed for the array, but only for the non-zero elements.我在某处np.zeros不应该真正分配数组所需的整个 memory，而只是分配非零元素。 Even though the Ubuntu machine has 64gb of memory, while my MacBook Pro has only 16gb.即使 Ubuntu 机器有 64gb 的 memory，而我的 MacBook Pro 只有 16gb。

versions:版本：

Ubuntu
os -> ubuntu mate 18
python -> 3.6.8
numpy -> 1.17.0

mac
os -> 10.14.6
python -> 3.6.4
numpy -> 1.17.0

PS: also failed on Google Colab PS：在 Google Colab 上也失败了

Answer 1

This is likely due to your system's overcommit handling mode.这可能是由于您系统的过量使用处理模式造成的。

In the default mode, 0 ,在默认模式下， 0 ，

Heuristic overcommit handling.启发式过量使用处理。 Obvious overcommits of address space are refused.明显的地址空间过度使用被拒绝。 Used for a typical system.用于典型系统。 It ensures a seriously wild allocation fails while allowing overcommit to reduce swap usage.它确保严重的疯狂分配失败，同时允许过度使用以减少交换使用。 The root is allowed to allocate slightly more memory in this mode.在这种模式下，允许根分配更多的内存。 This is the default.这是默认设置。

The exact heuristic used is not well explained here, but this is discussed more on Linux over commit heuristic and on this page .此处没有很好地解释所使用的确切启发式方法，但在Linux 上通过提交启发式方法和此页面上对此进行了更多讨论。

You can check your current overcommit mode by running您可以通过运行来检查当前的过量使用模式

$ cat /proc/sys/vm/overcommit_memory
0

In this case, you're allocating在这种情况下，您正在分配

>>> 156816 * 36 * 53806 / 1024.0**3
282.8939827680588

~282 GB and the kernel is saying well obviously there's no way I'm going to be able to commit that many physical pages to this, and it refuses the allocation. ~282 GB 并且内核说得很好，显然我无法将那么多物理页面提交给它，并且它拒绝分配。

If (as root) you run:如果（以 root 身份）运行：

$ echo 1 > /proc/sys/vm/overcommit_memory

This will enable the "always overcommit" mode, and you'll find that indeed the system will allow you to make the allocation no matter how large it is (within 64-bit memory addressing at least).这将启用“始终过量使用”模式，您会发现系统确实允许您进行分配，无论它有多大（至少在 64 位内存寻址范围内）。

I tested this myself on a machine with 32 GB of RAM.我自己在具有 32 GB RAM 的机器上对此进行了测试。 With overcommit mode 0 I also got a MemoryError , but after changing it back to 1 it works:使用 overcommit mode 0我也得到了一个MemoryError ，但是在将它改回1它可以工作：

>>> import numpy as np
>>> a = np.zeros((156816, 36, 53806), dtype='uint8')
>>> a.nbytes
303755101056

You can then go ahead and write to any location within the array, and the system will only allocate physical pages when you explicitly write to that page.然后，您可以继续写入阵列中的任何位置，系统只会在您明确写入该页面时分配物理页面。 So you can use this, with care, for sparse arrays.所以你可以小心地将它用于稀疏数组。

Answer 2

I had this same problem on Window's and came across this solution.我在 Window 上遇到了同样的问题，并遇到了这个解决方案。 So if someone comes across this problem in Windows the solution for me was to increase the pagefile size, as it was a Memory overcommitment problem for me too.因此，如果有人在 Windows 中遇到此问题，我的解决方案是增加页面文件大小，因为这对我来说也是内存过量使用的问题。

Windows 8视窗 8

On the Keyboard Press the WindowsKey + X then click System in the popup menu在键盘上按 Windows 键 + X，然后在弹出菜单中单击系统
Tap or click Advanced system settings.点击或单击高级系统设置。 You might be asked for an admin password or to confirm your choice系统可能会要求您输入管理员密码或确认您的选择
On the Advanced tab, under Performance, tap or click Settings.在“高级”选项卡上的“性能”下，点击或单击“设置”。
Tap or click the Advanced tab, and then, under Virtual memory, tap or click Change点击或单击“高级”选项卡，然后在“虚拟内存”下，点击或单击“更改”
Clear the Automatically manage paging file size for all drives check box.清除自动管理所有驱动器的分页文件大小复选框。
Under Drive [Volume Label], tap or click the drive that contains the paging file you want to change在驱动器 [卷标] 下，点击或单击包含要更改的分页文件的驱动器
Tap or click Custom size, enter a new size in megabytes in the initial size (MB) or Maximum size (MB) box, tap or click Set, and then tap or click OK点击或单击自定义大小，在初始大小 (MB) 或最大大小 (MB) 框中输入以兆字节为单位的新大小，点击或单击设置，然后点击或单击确定
Reboot your system重新启动系统

Windows 10视窗 10

Press the Windows key按 Windows 键
Type SystemPropertiesAdvanced类型 SystemPropertiesAdvanced
Click Run as administrator点击以管理员身份运行
Under Performance, click Settings在性能下，单击设置
Select the Advanced tab选择高级选项卡
Select Change...选择更改...
Uncheck Automatically managing paging file size for all drives取消选中自动管理所有驱动器的分页文件大小
Then select Custom size and fill in the appropriate size然后选择Custom size并填写合适的尺寸
Press Set then press OK then exit from the Virtual Memory, Performance Options, and System Properties Dialog按 Set 然后按 OK 然后退出 Virtual Memory、Performance Options 和 System Properties 对话框
Reboot your system重新启动系统

Note: I did not have the enough memory on my system for the ~282GB in this example but for my particular case this worked.注意：在本示例中，我的系统上没有足够的内存用于 ~282GB，但对于我的特殊情况，这是有效的。

EDIT编辑

From here the suggested recommendations for page file size:从这里对页面文件大小的建议建议：

There is a formula for calculating the correct pagefile size.有一个计算正确页面文件大小的公式。 Initial size is one and a half (1.5) x the amount of total system memory.初始大小是系统总内存量的二分之一 (1.5) x。 Maximum size is three (3) x the initial size.最大尺寸为三 (3) x 初始尺寸。 So let's say you have 4 GB (1 GB = 1,024 MB x 4 = 4,096 MB) of memory.因此，假设您有 4 GB（1 GB = 1,024 MB x 4 = 4,096 MB）的内存。 The initial size would be 1.5 x 4,096 = 6,144 MB and the maximum size would be 3 x 6,144 = 18,432 MB.初始大小为 1.5 x 4,096 = 6,144 MB，最大大小为 3 x 6,144 = 18,432 MB。

Some things to keep in mind from here :从这里要记住的一些事情：

However, this does not take into consideration other important factors and system settings that may be unique to your computer.但是，这并未考虑您的计算机可能独有的其他重要因素和系统设置。 Again, let Windows choose what to use instead of relying on some arbitrary formula that worked on a different computer.同样，让 Windows 选择要使用的内容，而不是依赖某些在不同计算机上运行的任意公式。

Also:还：

Increasing page file size may help prevent instabilities and crashing in Windows.增加页面文件大小可能有助于防止 Windows 中的不稳定和崩溃。 However, a hard drive read/write times are much slower than what they would be if the data were in your computer memory.但是，硬盘驱动器的读/写时间比数据在计算机内存中时慢得多。 Having a larger page file is going to add extra work for your hard drive, causing everything else to run slower.拥有更大的页面文件将为您的硬盘增加额外的工作，导致其他一切运行速度变慢。 Page file size should only be increased when encountering out-of-memory errors, and only as a temporary fix.仅在遇到内存不足错误时才应增加页面文件大小，并且只能作为临时修复。 A better solution is to adding more memory to the computer.更好的解决方案是向计算机添加更多内存。

Answer 3

I came across this problem on Windows too.我在 Windows 上也遇到了这个问题。 The solution for me was to switch from a 32-bit to a 64-bit version of Python .我的解决方案是从 32 位版本切换到 64 位版本的 Python 。 Indeed, a 32-bit software, like a 32-bit CPU, can adress a maximum of 4 GB of RAM (2^32).事实上，32 位软件，如 32 位 CPU，最多可以处理 4 GB的 RAM (2^32)。 So if you have more than 4 GB of RAM, a 32-bit version cannot take advantage of it.因此，如果您拥有超过 4 GB 的 RAM，则 32 位版本无法利用它。

With a 64-bit version of Python (the one labeled x86-64 in the download page), the issue disappears.使用 64 位版本的 Python（下载页面中标记为x86-64 的版本），问题就消失了。

You can check which version you have by entering the interpreter.您可以通过输入解释器来检查您的版本。 I, with a 64-bit version, now have: Python 3.7.5rc1 (tags/v3.7.5rc1:4082f600a5, Oct 1 2019, 20:28:14) [MSC v.1916 64 bit (AMD64)] , where [MSC v.1916 64 bit (AMD64)] means "64-bit Python".我，使用 64 位版本，现在有： Python 3.7.5rc1 (tags/v3.7.5rc1:4082f600a5, Oct 1 2019, 20:28:14) [MSC v.1916 64 bit (AMD64)] ，其中 [ MSC v.1916 64 位 (AMD64)] 表示“64 位 Python”。

Sources :来源：

Answer 4

In my case, adding a dtype attribute changed dtype of the array to a smaller type(from float64 to uint8), decreasing array size enough to not throw MemoryError in Windows(64 bit).在我的情况下，添加 dtype 属性将数组的 dtype 更改为较小的类型（从 float64 到 uint8），减少数组大小足以在 Windows（64 位）中不抛出 MemoryError。

from从

mask = np.zeros(edges.shape)

to到

mask = np.zeros(edges.shape,dtype='uint8')

Answer 5

Sometimes, this error pops up because of the kernel has reached its limit.有时，由于内核已达到极限，会弹出此错误。 Try to restart the kernel redo the necessary steps.尝试重新启动内核重做必要的步骤。

Answer 6

change the data type to another one which uses less memory works.将数据类型更改为另一种使用较少内存的工作。 For me, I change the data type to numpy.uint8:对我来说，我将数据类型更改为 numpy.uint8：

data['label'] = data['label'].astype(np.uint8)

Answer 7

Not the most comprehensive answer, however thought will share it anyway. 不是最全面的答案，但是无论如何，思想都会共享它。

I had the same error when using Jupyter Notebook running locally and Chrome browser. 使用本地运行的Jupyter Notebook和Chrome浏览器时出现相同的错误。 Closed and quite the jupyter notebook, closed Chrome and restarted the Jupyter Notebook using Firefox and that solved my issue. 关闭并完全关闭Jupyter Notebook，关闭Chrome，然后使用Firefox重新启动Jupyter Notebook，这解决了我的问题。

Answer 8

I have had the same problem on a 64 bit Windows 10, Python, and Pycharm.我在 64 位 Windows 10、Python 和 Pycharm 上遇到了同样的问题。 The solution was very simple:解决方案非常简单：

My hard drive was full, there was no disk space for memory paging.我的硬盘已满，没有用于内存分页的磁盘空间。 Removed a few files and the problem was solved.删除了几个文件，问题就解决了。

Answer 9

I faced the same issue running pandas in a docker contain on EC2.我在 EC2 上包含的 docker 中运行 pandas 时遇到了同样的问题。 I tried the above solution of allowing overcommit memory allocation via sysctl -w vm.overcommit_memory=1 (more info on this here ), however this still didn't solve the issue.我尝试了上述通过sysctl -w vm.overcommit_memory=1允许过度使用 memory 分配的解决方案（更多信息在这里），但这仍然没有解决问题。

Rather than digging deeper into the memory allocation internals of Ubuntu/EC2, I started looking at options to parallelise the DataFrame, and discovered that using dask worked in my case:我没有深入研究 Ubuntu/EC2 的 memory 分配内部结构，而是开始研究并行化 DataFrame 的选项，并发现使用dask在我的案例中有效：

import dask.dataframe as dd
df = dd.read_csv('path_to_large_file.csv')
...

Your mileage may vary, and note that the dask API is very similar but not a complete like to like for pandas/numpy (eg you may need to make some code changes in places depending on what you're doing with the data).您的里程可能会有所不同，请注意 dask API 非常相似，但不是完全喜欢 pandas/numpy（例如，您可能需要根据您对数据的处理方式在某些地方进行一些代码更改）。

Answer 10

I was having this issue with numpy by trying to have image sizes of 600x600 (360K) , I decided to reduce to 224x224 (~50k) , a reduction in memory usage by a factor of 7.我在尝试使用 600x600 (360K) 的图像尺寸时遇到了 numpy 的这个问题，我决定减少到 224x224 (~50k) ，memory 的使用量减少了 7 倍。

X_set = np.array(X_set).reshape(-1, 600 * 600 * 3)

is now就是现在

X_set = np.array(X_set).reshape(-1, 224 * 224 * 3)

hope this helps希望这可以帮助

Answer 11

from pandas_profiling import ProfileReport prof = ProfileReport(df,minimal = True) prof.to_file(output_file = 'output.html') from pandas_profiling import ProfileReport prof = ProfileReport(df,minimal = True) prof.to_file(output_file = 'output.html')

worked for me为我工作

Answer 12

i had the same problem but reason was simple wrong decimal delimiter setup "," instead of "."我有同样的问题，但原因很简单，十进制分隔符设置错误“，”而不是“。” enjoy请享用

无法分配具有形状和数据类型的数组

问题描述

8 个解决方案

解决方案1
140 已采纳 2019-08-15 14:52:02

解决方案2
82 2019-11-04 03:36:40

解决方案3
32 2019-11-29 08:28:43

解决方案4
5 2020-01-21 08:49:29

解决方案5
4 2020-03-12 09:45:42

解决方案6
2 2020-07-29 11:40:47

解决方案7
0 2019-11-19 13:19:17

解决方案8
0 2021-11-08 07:18:41

解决方案9
0 2022-08-12 13:35:28

解决方案10
0 2022-11-17 20:13:33

解决方案11
0 2023-01-23 16:01:18

解决方案12
-1 2021-05-02 07:28:41

无法分配具有形状和数据类型的数组

问题描述

8 个解决方案

解决方案1 140 已采纳 2019-08-15 14:52:02

解决方案2 82 2019-11-04 03:36:40

解决方案3 32 2019-11-29 08:28:43

解决方案4 5 2020-01-21 08:49:29

解决方案5 4 2020-03-12 09:45:42

解决方案6 2 2020-07-29 11:40:47

解决方案7 0 2019-11-19 13:19:17

解决方案8 0 2021-11-08 07:18:41

解决方案9 0 2022-08-12 13:35:28

解决方案10 0 2022-11-17 20:13:33

解决方案11 0 2023-01-23 16:01:18

解决方案12 -1 2021-05-02 07:28:41

解决方案1
140 已采纳 2019-08-15 14:52:02

解决方案2
82 2019-11-04 03:36:40

解决方案3
32 2019-11-29 08:28:43

解决方案4
5 2020-01-21 08:49:29

解决方案5
4 2020-03-12 09:45:42

解决方案6
2 2020-07-29 11:40:47

解决方案7
0 2019-11-19 13:19:17

解决方案8
0 2021-11-08 07:18:41

解决方案9
0 2022-08-12 13:35:28

解决方案10
0 2022-11-17 20:13:33

解决方案11
0 2023-01-23 16:01:18

解决方案12
-1 2021-05-02 07:28:41