简体   繁体   English

为什么在8GB内存的macOS计算机上使用352GB NumPy ndarray?

[英]Why can a 352GB NumPy ndarray be used on an 8GB memory macOS computer?

import numpy as np

array = np.zeros((210000, 210000)) # default numpy.float64
array.nbytes

When I run the above code on my 8GB memory MacBook with macOS, no error occurs. 当我在装有macOS的8GB内存MacBook上运行上述代码时,没有发生错误。 But running the same code on a 16GB memory PC with Windows 10, or a 12GB memory Ubuntu laptop, or even on a 128GB memory Linux supercomputer, the Python interpreter will raise a MemoryError. 但是,如果在装有Windows 10的16GB内存PC,12GB内存Ubuntu笔记本电脑甚至是128GB内存Linux超级计算机上运行相同的代码,Python解释器将引发MemoryError。 All the test environments have 64-bit Python 3.6 or 3.7 installed. 所有测试环境都安装了64位Python 3.6或3.7。

@Martijn Pieters' answer is on the right track, but not quite right: this has nothing to do with memory compression, but instead it has to do with virtual memory . @Martijn Pieters的答案是正确的,但并非完全正确:这与内存压缩无关,而与虚拟内存有关

For example, try running the following code on your machine: 例如,尝试在计算机上运行以下代码:

arrays = [np.zeros((21000, 21000)) for _ in range(0, 10000)]

This code allocates 32TiB of memory, but you won't get an error (at least I didn't, on Linux). 这段代码分配了32TiB的内存,但是您不会收到错误消息(至少在Linux上没有)。 If I check htop, I see the following: 如果我检查htop,则会看到以下内容:

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
31362 user       20   0 32.1T 69216 12712 S  0.0  0.4  0:00.22 python

This because the OS is perfectly willing to overcommit on virtual memory . 这是因为OS完全愿意在虚拟内存过量使用 It won't actually assign pages to physical memory until it needs to. 除非需要,否则它实际上不会将页面分配给物理内存。 The way it works is: 它的工作方式是:

  • calloc asks the OS for some memory to use calloc要求操作系统使用一些内存
  • the OS looks in the process's page tables, and finds a chunk of memory that it's willing to assign. 操作系统在进程的页表中查找并找到愿意分配的内存块。 This is fast operation, the OS just stores the memory address range in an internal data structure. 这是快速的操作,操作系统仅将内存地址范围存储在内部数据结构中。
  • the program writes to one of the addresses. 程序将写入其中一个地址。
  • the OS receives a page fault , at which point it looks and actually assigns the page to physical memory. 操作系统收到页面错误 ,这时它看起来并实际将页面分配给物理内存。 A page is usually a few KiB in size . 一个页面通常只有几KiB的大小
  • the OS passes control back to the program, which proceeds without noticing the interruption. 操作系统将控制权交还给程序,程序继续执行而不会注意到中断。

Creating a single huge array doesn't work on Linux because, by default, a "heuristic algorithm is applied to figure out if enough memory is available". 创建单个大数组在Linux上不起作用,因为默认情况下, “采用启发式算法来确定是否有足够的内存”。 ( thanks @Martijn Pieters! ) Some experiments on my system show that for me, the kernel is unwilling to provide more than 0x3BAFFFFFF bytes. 感谢@Martijn Pieters! )对我的系统进行的一些实验表明,对于我来说,内核不愿意提供超过0x3BAFFFFFF字节。 However, if I run echo 1 | sudo tee /proc/sys/vm/overcommit_memory 但是,如果我运行echo 1 | sudo tee /proc/sys/vm/overcommit_memory echo 1 | sudo tee /proc/sys/vm/overcommit_memory , and then try the program in the OP again, it works fine. echo 1 | sudo tee /proc/sys/vm/overcommit_memory ,然后再次在OP中尝试该程序,它可以正常工作。

For fun, try running arrays = [np.ones((21000, 21000)) for _ in range(0, 10000)] . 为了好玩,尝试运行arrays = [np.ones((21000, 21000)) for _ in range(0, 10000)] You'll definitely get an out of memory error, even on MacOs or Linux with swap compression. 即使在MacO或具有交换压缩功能的Linux上,您也绝对会遇到内存不足的错误。 Yes, certain OSes can compress RAM, but they can't compress it to the level that you wouldn't run out of memory. 是的,某些操作系统可以压缩RAM,但无法将其压缩到不会耗尽内存的水平。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Scipy ndimage形态运算符使我的计算机内存RAM饱和(8GB) - Scipy ndimage morphology operators saturate my computer memory RAM (8GB) 为什么6GB的csv文件无法以numpy的形式整体读取到内存(64GB) - Why 6GB csv file is not possible to read whole to memory (64GB) in numpy 为什么导入numpy会在Linux上添加1 GB的虚拟内存? - Why does importing numpy add 1 GB of virtual memory on Linux? Raspberry Pi 4 - 8gb RAM、64gb SD 卡内存不足试图加载 Tensorflow 模型 - Raspberry Pi 4 - 8gb RAM, 64gb SD Card Running Out of Memory Trying to Load Tensorflow Model 正在加载/流式传输8GB txt文件? 并标记化 - Loading / Streaming 8GB txt file?? And tokenize 在 8GB RAM 上线性规划 16GB 数据 - Linear Programing of 16GB of data on 8GB of RAM 在pyspark中广播大型阵列(〜8GB) - Broadcast large array in pyspark (~ 8GB) numpy ndarray 使用了多少内存? - How much memory is used by a numpy ndarray? 带有12GB RAM的Numpy阵列内存错误 - Numpy Array Memory error with 12GB RAM 64位系统,8GB的RAM,超过800MB的CSV并使用python读取会导致内存错误 - 64 bit system, 8gb of ram, a bit more than 800MB of CSV and reading with python gives memory error
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM