简体   繁体   English

ReadProcessMemory比SharedMemory上的memcpy更快

[英]ReadProcessMemory faster than memcpy on SharedMemory

I'm trying to improve my multiprocess application by using shared memory to communicate. 我正在尝试通过使用共享内存进行通信来改进多进程应用程序。 I was doing some profiling with simple tests and something strange came out. 我用简单的测试进行了分析,结果有些奇怪。 When I'm trying to copy the data stored in the SharedMemory, it's faster with ReadProcessMemory than with Memcopy. 当我尝试复制存储在SharedMemory中的数据时,使用ReadProcessMemory比使用Memcopy更快。

I know I'm not supposed to use SharedMemory that way (it's better to read straight inside the shared memory), but I'm still wondering why is this happening. 我知道我不应该那样使用SharedMemory(最好直接在共享内存中读取),但是我仍然想知道为什么会这样。 By pursuing my investigation further, another thing showed up : if I do 2 consecutive memcpy on the same shared memory area (in fact, the same region), the second copy is twice faster than the first. 通过进一步进行调查,发现了另一件事:如果我在相同的共享内存区域(实际上是同一区域)上进行了2次连续的memcpy,则第二个副本的速度比第一个副本快两倍。

Here is a sample code showing the problem. 这是显示问题的示例代码。 In this example, there is only one process but the problem is stil here. 在此示例中,只有一个过程,但问题仍然存在。 Doing a memcpy from the shared memory region is slower than doing a ReadProcessMemory of that same area on my own process ! 从共享内存区域执行memcpy比在我自己的进程中执行相同区域的ReadProcessMemory慢!

#include <tchar.h>
#include <basetsd.h>
#include <iostream>

#include <boost/interprocess/mapped_region.hpp>
#include <boost/interprocess/windows_shared_memory.hpp>
#include <time.h>
namespace bip = boost::interprocess;
#include <boost/asio.hpp>

 bip::windows_shared_memory* AllocateSharedMemory(UINT32 a_UI32_Size)
{
    bip::windows_shared_memory* l_pShm = new bip::windows_shared_memory (bip::create_only, "Global\\testSharedMemory", bip::read_write, a_UI32_Size);
    bip::mapped_region l_region(*l_pShm, bip::read_write);
    std::memset(l_region.get_address(), 1, l_region.get_size());
    return l_pShm;
}

//Copy the shared memory with memcpy
void CopySharedMemory(UINT32 a_UI32_Size)
{
    bip::windows_shared_memory m_shm(bip::open_only, "Global\\testSharedMemory", bip::read_only);
    bip::mapped_region l_region(m_shm, bip::read_only);
    void* l_pData = malloc(a_UI32_Size);
    memcpy(l_pData, l_region.get_address(), a_UI32_Size);
    free(l_pData);
}

//Copy the shared memory with ReadProcessMemory
void ProcessCopySharedMemory(UINT32 a_UI32_Size)
{
    bip::windows_shared_memory m_shm(bip::open_only, "Global\\testSharedMemory", bip::read_only);
    bip::mapped_region l_region(m_shm, bip::read_only);
    void* l_pData = malloc(a_UI32_Size);
    HANDLE hProcess = OpenProcess( PROCESS_ALL_ACCESS, FALSE,(DWORD) GetCurrentProcessId());
    size_t l_szt_CurRemote_Readsize;
    ReadProcessMemory(hProcess,
                      (LPCVOID)((void*)l_region.get_address()),
                      l_pData,
                      a_UI32_Size,
                      (SIZE_T*)&l_szt_CurRemote_Readsize);
    free(l_pData);
}

// do 2 memcpy on the same shared memory
void CopySharedMemory2(UINT32 a_UI32_Size)
{
    bip::windows_shared_memory m_shm(bip::open_only, "Global\\testSharedMemory", bip::read_only);
    bip::mapped_region l_region(m_shm, bip::read_only);
    clock_t begin = clock();
    void* l_pData = malloc(a_UI32_Size);
    memcpy(l_pData, l_region.get_address(), a_UI32_Size);
    clock_t end = clock();
    std::cout << "FirstCopy: " << (end - begin) * 1000 / CLOCKS_PER_SEC << " ms" << std::endl; 
    free(l_pData);

    begin = clock();
    l_pData = malloc(a_UI32_Size);
    memcpy(l_pData, l_region.get_address(), a_UI32_Size);
    end = clock();
    std::cout << "SecondCopy: " << (end - begin) * 1000 / CLOCKS_PER_SEC << " ms" << std::endl; 
    free(l_pData);
}

int _tmain(int argc, _TCHAR* argv[])
{
    UINT32 l_UI32_Size = 1048576000;
    bip::windows_shared_memory* l_pShm = AllocateSharedMemory(l_UI32_Size);
    clock_t begin = clock();
    for (int i=0; i<10 ; i++)
        CopySharedMemory(l_UI32_Size);
    clock_t end = clock();
    std::cout << "MemCopy: " << (end - begin) * 1000 / CLOCKS_PER_SEC << " ms" << std::endl; 
    begin = clock();
    for (int i=0; i<10 ; i++)
        ProcessCopySharedMemory(l_UI32_Size);
    end = clock();
    std::cout << "ReadProcessMemory: " << (end - begin) * 1000 / CLOCKS_PER_SEC << " ms" << std::endl; 

    for (int i=0; i<10 ; i++)
        CopySharedMemory2(l_UI32_Size);

    delete l_pShm;
    return 0;
}

And here is the output : 这是输出:

MemCopy: 8891 ms
ReadProcessMemory: 6068 ms

FirstCopy: 796 ms
SecondCopy: 327 ms
FirstCopy: 795 ms
SecondCopy: 328 ms
FirstCopy: 780 ms
SecondCopy: 344 ms
FirstCopy: 780 ms
SecondCopy: 343 ms
FirstCopy: 780 ms
SecondCopy: 327 ms
FirstCopy: 795 ms
SecondCopy: 343 ms
FirstCopy: 780 ms
SecondCopy: 344 ms
FirstCopy: 796 ms
SecondCopy: 343 ms
FirstCopy: 796 ms
SecondCopy: 327 ms
FirstCopy: 780 ms
SecondCopy: 328 ms

If anybody has an idea on why the memcpy is so slow and if there is a solution to this problem, I'm all ears. 如果有人对为什么memcpy这么慢有一个想法,并且是否有解决此问题的方法,那我将不胜枚举。

Thanks. 谢谢。

My comment as answer for reference. 我的评论作为参考。

Using 'memcpy' across a big chunk of memory would need the OS to sift through its process/memory tables for each new page copied. 在大块内存中使用“ memcpy”将需要操作系统筛选每个复制的新页面的进程/内存表。 Using 'ReadProcessMemory', in turn, tells the OS directly which pages from which process to which other process should be copied. 依次使用“ ReadProcessMemory”,可以直接告诉操作系统应将哪个页面从哪个进程复制到哪个其他进程。

This difference went away as you benchmarked with a single page, confirming some of this. 当您使用单个页面进行基准测试时,这种差异消失了,从而证实了其中的一些内容。

I could guess that the reason why 'memcpy' is faster in the 'small' scenario might be that 'ReadProcessMemory' has an extra switch from user to kernel mode to do. 我猜想在较小的情况下“ memcpy”更快的原因可能是“ ReadProcessMemory”需要从用户模式切换到内核模式。 Memcpy, on the other hand, sort of offloads the task to the underlying memory management system, which always runs in parallel with your process and is supported natively by the hardware to some extent. 另一方面,Memcpy将任务转移到底层的内存管理系统,该系统始终与您的进程并行运行,并且在某种程度上受到硬件的本地支持。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM