简体繁体 English

在64位（或32位）Windows上以32位进程访问> 2,3,4GB文件

[英]Accessing >2,3,4GB Files in 32bit Process on 64bit (or 32bit) Windows

原文 2013-04-03 20:22:46 4 1 c++/ windows/ memory-management/ out-of-memory/ memory-mapped-files

Disclaimer: I apologize for the verbosity of this question (I think it's an interesting problem, though!), yet I cannot figure out how to more concisely word it. 免责声明：对于这个问题的冗长性，我深表歉意（不过，我认为这是一个有趣的问题！），但我不知道如何更简洁地表达它。

I have done hours of research as to the apparently myriad of ways in which to solve the problem of accessing multi-GB files in a 32bit process on 64bit Windows 7, ranging from /LARGEADDRESSAWARE to VirtualAllocEx AWE. 我已经进行了数小时的研究，涉及从/LARGEADDRESSAWARE到VirtualAllocEx AWE范围内解决在64位Windows 7上以32位进程访问多GB文件的方法。 I am somewhat comfortable in writing a multi-view memory-mapped system in Windows (CreateFileMapping, MapViewOfFile, etc.), yet can't quite escape the feeling that there is a more elegant solution to this problem. 我在Windows中编写多视图内存映射的系统（CreateFileMapping，MapViewOfFile等）有些自在，但不能完全避免有解决此问题的更优雅解决方案的感觉。 Also, I'm quite aware of Boost's interprocess and iostream templates, although they appear to be rather lightweight, requiring a similar amount of effort to writing a system utilizing only Windows API calls (not to mention the fact that I already have a memory-mapped architecture semi-implemented using Windows API calls). 另外，尽管它们看起来很轻巧，但我非常了解Boost的进程间模板和iostream模板，需要花费大量的精力来编写仅使用Windows API调用的系统（更不用说我已经拥有一个内存，使用Windows API调用半实现的映射架构）。

I'm attempting to process large datasets. 我正在尝试处理大型数据集。 The program depends on pre-compiled 32bit libraries, which is why, for the moment, the program itself is also running in a 32bit process, even though the system is 64bit, with a 64bit OS. 该程序依赖于预编译的32位库，因此，即使系统是64位且具有64位OS，目前程序本身仍在32位进程中运行。 I know there are ways in which I could add wrapper libraries around this, yet, seeing as it's part of a larger codebase, it would indeed be a bit of an undertaking. 我知道有一些方法可以围绕此添加包装器库，但是，由于它是较大代码库的一部分，因此确实有些艰巨。 I set the binary headers to allow for /LARGEADDRESSAWARE (at the expense of decreasing my kernel space?), such that I get up to around 2-3 GB of addressable memory per process, give or take (depending on heap fragmentation, etc.). 我将二进制标头设置为允许/LARGEADDRESSAWARE （以减少我的内核空间为代价？），这样我就可以为每个进程获取或获取大约2-3 GB的可寻址内存（取决于堆碎片等）。）。

Here's the issue: the datasets are 4+GB, and have DSP algorithms run upon them that require essentially random access across the file. 问题是：数据集为4 + GB，并且对它们运行DSP算法，这些算法实际上需要对文件进行随机访问。 A pointer to the object generated from the file is handled in C#, yet the file itself is loaded into memory (with this partial memory-mapped system) in C++ (it's P/Invoked). 指向从文件生成的对象的指针是在C＃中处理的，但文件本身已加载到C ++（带有P / Invoked）的内存中（具有部分内存映射系统）。 Thus, I believe the solution is unfortunately not as simple as simply adjusting the windowing to access the portion of the file I need to access, as essentially I want to still have the entire file abstracted into a single pointer, from which I can call methods to access data almost anywhere in the file. 因此，我认为该解决方案并不像简单地调整窗口以访问需要访问的文件部分那样简单，因为从本质上讲，我仍然希望将整个文件抽象为单个指针，从中可以调用方法访问文件中几乎任何地方的数据。

Apparently, most memory mapped architectures rely upon splitting the singular process into multiple processes.. so, for example, I'd access a 6 GB file with 3x processes, each holding a 2 GB window to the file. 显然，大多数内存映射体系结构都依赖于将单个进程拆分为多个进程。因此，例如，我将使用3个进程访问一个6 GB的文件，每个进程均拥有一个2 GB的文件窗口。 I would then need to add a significant amount of logic to pull and recombine data from across these different windows/processes. 然后，我需要添加大量逻辑来从这些不同的窗口/进程中提取和重组数据。 VirtualAllocEx apparently provides a method of increasing the virtual address space, but I'm still not entirely sure if this is the best way of going about it. VirtualAllocEx显然提供了一种增加虚拟地址空间的方法，但是我仍然不确定是否这是最好的方法。

But, let's say I want this program to function just as "easily" as a singular 64bit proccess on a 64bit system. 但是，比方说，我希望该程序像在64位系统上的单个64位处理一样“轻松”地运行。 Assume that I don't care about thrashing, I just want to be able to manipulate a large file on the system, even if only, say, 500 MB were loaded into physical RAM at any one time. 假设我不关心抖动，我只是希望能够操纵系统上的一个大文件，即使在任何时候仅将500 MB加载到物理RAM中也是如此。 Is there any way to obtain this functionality without having to write a somewhat ridiculous, manual memory system by hand? 有什么方法可以获得此功能，而不必手动编写一些荒谬的手动存储系统？ Or, is there some better way than what I have found through thusfar combing SO and the internet? 或者，是否有比我到目前为止结合SO和Internet更好的方法？

This lends itself to a secondary question: is there a way of limiting how much physical RAM would be used by this process? 这就引出了一个次要问题：是否有一种方法可以限制该过程使用多少物理RAM？ For example, what if I wanted to limit the process to only having 500 MB loaded into physical RAM at any one time (whilst keeping the multi-GB file paged on disk)? 例如，如果我想将进程限制为一次只能将500 MB加载到物理RAM中（同时将多GB文件分页到磁盘上）？

I'm sorry for the long question, but I feel as though it's a decent summary of what appear to be many questions (with only partial answers) that I've found on SO and the net at large. 对于这个漫长的问题，我感到抱歉，但是我觉得这似乎是对我在SO和整个网络上发现的许多问题（只有部分答案）的总结。 I'm hoping that this can be an area wherein a definitive answer (or at least some pros/cons) can be fleshed out, and we can all learn something valuable in the process! 我希望这是一个可以充实确定性答案（或至少某些利弊）的领域，并且我们都可以在此过程中学到一些宝贵的知识！

1 个解决方案

You could write an accessor class which you give it a base address and a length. 您可以编写一个访问器类，为它提供基地址和长度。 It returns data or throws exception (or however else you want to inform of error conditions) if error conditions arise (out of bounds, etc). 如果出现错误条件（超出范围等），它将返回数据或引发异常（或者，否则您想通知错误条件）。

Then, any time you need to read from the file, the accessor object can use SetFilePointerEx() before calling ReadFile() . 然后，任何时候需要读取文件时，访问器对象都可以在调用ReadFile()之前使用SetFilePointerEx() ReadFile() 。 You can then pass the accessor class to the constructor of whatever objects you create when you read the file. 然后，您可以将访问器类传递给读取文件时创建的任何对象的构造函数。 The objects then use the accessor class to read the data from the file. 然后，对象使用访问器类从文件中读取数据。 Then it returns the data to the object's constructor which parses it into object data. 然后，它将数据返回到对象的构造函数，后者将其解析为对象数据。

If, later down the line, you're able to compile to 64-bit, you can just change (or extend) the accessor class to read from memory instead. 如果稍后可以编译为64位，则可以更改（或扩展）访问器类以从内存中读取。

As for limiting the amount of RAM used by the process.. that's mostly a matter of making sure that A) you don't have memory leaks (especially obscene ones) and B) destroying objects you don't need at the very moment. 至于限制该进程使用的RAM的数量，这主要是确保A）没有内存泄漏（尤其是淫秽的内存泄漏），以及B）销毁当前不需要的对象。 Even if you will need it later down the line but the data won't change... just destroy the object. 即使您稍后需要它，但数据也不会改变……只要销毁对象即可。 Then recreate it later when you do need it, allowing it to re-read the data from the file. 然后在需要时重新创建它，从而允许它从文件中重新读取数据。