CUDA和固定（页面锁定）内存根本没有页面锁定？

Question

I try to figure out if CUDA (or the OpenCL implementation) tells the truth when I require pinned (page locked) memory. 当我需要固定（页面锁定）内存时，我试图弄清楚CUDA（或OpenCL实现）是否说实话。

I tried cudaMallocHost and looked at the /proc/meminfo values Mlocked and Unevictable , both stay at 0 and never go up ( /proc/<pid>/status reports VmLck also as 0). 我尝试了cudaMallocHost并查看了/proc/meminfo值Mlocked和Unevictable ，两者都保持在0并且从不上升（ /proc/<pid>/status报告VmLck也为0）。 I used mlock to page lock memory and the values go up as expected. 我使用mlock来锁定内存，值按预期上升。

So two possible reasons for this behavior might be: 因此，这种行为的两个可能原因可能是：

I don't get page locked memory from the CUDA API and the cudaSuccess is a fake 我没有从CUDA API获取页面锁定内存，而cudaSuccess是假的
CUDA bypasses the OS counters for page locked memory because CUDA does some magic with the linux kernel CUDA绕过用于页面锁定内存的OS计数器，因为CUDA对linux内核有一定的魔力

So the actual question is: Why can't I get the values for page locked memory from the OS when I use CUDA to allocate page locked memory? 所以实际问题是：当我使用CUDA分配页面锁定内存时，为什么我不能从操作系统获取页面锁定内存的值？

Additionally: Where can I get the right values if not from /proc/meminfo or /proc/<pid>/status ? 另外：如果没有来自/proc/meminfo或/proc/<pid>/status我在哪里可以获得正确的值？

Thanks! 谢谢！

System: Ubuntu 14.04.01 LTS; 系统：Ubuntu 14.04.01 LTS; CUDA 6.5; CUDA 6.5; Nvidida Driver 340.29; Nvidida Driver 340.29; Nvidia Tesla K20c Nvidia Tesla K20c

Answer 1

It would seem that the pinned allocator on CUDA 6.5 under the hood is using mmap() with MAP_FIXED. 似乎CUDA 6.5下的固定分配器正在使用带有MAP_FIXED的mmap() 。 Although I am not an OS expert, this apparently has the effect of "pinning" memory, ie ensuring that its address never changes. 虽然我不是操作系统专家，但这显然具有“固定”内存的作用，即确保其地址永远不会改变。 However this is not a complete explanation. 然而，这不是一个完整的解释。 Refer to the answer by @Jeff which points out what is almost certainly the "missing piece". 请参阅@Jeff的答案，其中指出几乎可以肯定的是“缺失的部分”。

Let's consider a short test program: 让我们考虑一个简短的测试程序：

#include <stdio.h>
#define DSIZE (1048576*1024)

int main(){

  int *data;
  cudaFree(0);
  system("cat /proc/meminfo > out1.txt");
  printf("*$*before alloc\n");
  cudaHostAlloc(&data, DSIZE, cudaHostAllocDefault);
  printf("*$*after alloc\n");
  system("cat /proc/meminfo > out2.txt");
  cudaFreeHost(data);
  system("cat /proc/meminfo > out3.txt");
  return 0;
}

If we run this program with strace , and excerpt the output part between the printf statements, we have: 如果我们用strace运行这个程序，并摘录printf语句之间的输出部分，我们有：

write(1, "*$*before alloc\n", 16*$*before alloc)       = 16
mmap(0x204500000, 1073741824, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED|MAP_ANONYMOUS, 0, 0) = 0x204500000
ioctl(11, 0xc0304627, 0x7fffcf72cce0)   = 0
ioctl(3, 0xc0384657, 0x7fffcf72cd70)    = 0
write(1, "*$*after alloc\n", 15*$*after alloc)        = 15

(note that 1073741824 is exactly one gigabyte, ie the same as the requested 1048576*1024) （请注意，1073741824正好是一千兆字节，即与请求的1048576 * 1024相同）

Reviewing the description of mmap , we have: 回顾mmap的描述，我们有：

address gives a preferred starting address for the mapping. address给出映射的首选起始地址。 NULL expresses no preference. NULL表示没有首选项。 Any previous mapping at that address is automatically removed. 此地址上的任何先前映射都将自动删除。 The address you give may still be changed, unless you use the MAP_FIXED flag. 除非您使用MAP_FIXED标志，否则您提供的地址仍可能会更改。

Therefore, assuming the mmap command is successful, the virtual address requested will be fixed, which is probably useful, but not the whole story. 因此，假设mmap命令成功，请求的虚拟地址将被修复，这可能是有用的，但不是整个故事。

As I mentioned, I am not a OS expert, and it's not obvious to me what exactly about this system call would create a "pinned" mapping/allocation. 正如我所提到的，我不是操作系统专家，对我而言，这个系统调用究竟是什么产生“固定”映射/分配并不明显。 It may be that the combination of MAP_SHARED|MAP_FIXED|MAP_ANONYMOUS somehow creates a pinned underlying allocation, but I've not found any evidence to support that. 可能是MAP_SHARED|MAP_FIXED|MAP_ANONYMOUS的组合以某种方式创建了固定的底层分配，但我没有找到任何证据支持这一点。

Based on this article it seems that even mlock() -ed pages would not meet the needs of DMA activity, which is one of the key goals of pinned host pages in CUDA. 基于这篇文章，似乎即使是mlock() ed页面也不能满足DMA活动的需要，这是CUDA中固定主机页面的关键目标之一。 Therefore, it seems that something else is providing the actual "pinning" (ie guaranteeing that the underlying physical pages are always memory-resident, and that their virtual-to-physical mapping doesn't change -- the latter part of this is possibly accomplished by MAP_FIXED along with whatever mechanism guarantees that the underlying physical pages don't move in any way). 因此，似乎其他东西提供了实际的“固定”（即保证底层物理页面总是驻留在内存中，并且它们的虚拟到物理映射不会改变 - 后者可能是由MAP_FIXED以及任何机制确保底层物理页面不以任何方式移动完成。

This mechanism apparently does not use mlock() , and so the mlock'ed pages don't change, before and after. 这个机制显然不使用mlock() ，因此mlock()的页面在之前和之后都没有变化。 However we would expect a change in the mapping statistic, and if we diff the out1.txt and out2.txt produced by the above program, we see (excerpted): 但是我们期望映射统计信息发生变化，如果我们对上面程序产生的out1.txt和out2.txt进行区分，我们会看到（摘录）：

< Mapped:            87488 kB
---
> Mapped:          1135904 kB

The difference is approximately a gigabyte, the amount of "pinned" memory requested. 差异大约是一千兆字节，即所请求的“固定”内存量。

Answer 2

Page-locked can mean different things. 页面锁定可能意味着不同的事情。 For user-space applications it usually means keeping the page in memory to avoid a page fault: 对于用户空间应用程序，通常意味着将页面保留在内存中以避免页面错误：

"A page that has been locked into memory with a call like mlock() is required to always be physically present in the system's RAM. At a superficial level, locked pages should thus never cause a page fault when accessed by an application. But there is nothing that requires a locked page to always be present in the same place; the kernel is free to move a locked page if the need arises ." “）已被锁定到内存与像m锁（调用一个页面需要永远是系统的RAM实际存在，在表面层次上，锁定页面应该这样永远当应用程序访问导致页面错误。 但是，没有什么需要锁定的页面始终存在于同一个地方;如果需要，内核可以自由移动锁定的页面 。“ [1] [1]

Note that these locked pages can still be moved around and aren't suitable for I/O device access. 请注意，这些锁定的页面仍然可以移动，不适合I / O设备访问。

Instead, another notion of page-locked is called pinning. 相反，另一个页面锁定的概念称为钉扎。 A pinned page keeps the same physical mapping. 固定页面保持相同的物理映射。 Drivers that need this typically do it rather directly and bypass locked page accounting. 需要此功能的驱动程序通常直接执行此操作并绕过锁定的页面记帐。 cudaMallocHost almost certainly uses the cuda driver to pin the pages in this fashion. cudaMallocHost几乎肯定会使用cuda驱动程序以这种方式固定页面。

More info at [1] below. 更多信息见[1]。

[1] https://lwn.net/Articles/600502/ [1] https://lwn.net/Articles/600502/

CUDA和固定（页面锁定）内存根本没有页面锁定？

问题描述

2 个解决方案

解决方案1
11 已采纳 2014-11-12 15:39:24

解决方案2
3 2018-06-28 07:26:52

CUDA和固定（页面锁定）内存根本没有页面锁定？

问题描述

2 个解决方案

解决方案1 11 已采纳 2014-11-12 15:39:24

解决方案2 3 2018-06-28 07:26:52

解决方案1
11 已采纳 2014-11-12 15:39:24

解决方案2
3 2018-06-28 07:26:52