简体   繁体   English

读取二进制文件而不用C ++将整个文件缓冲到内存中

[英]Reading binary files without buffering the whole file into memory in C++

In order to make a binary comparer I'm trying to read in the binary contents of two files using the CreateFileW function. 为了制作二进制比较器,我试图使用CreateFileW函数读取两个文件的二进制内容。 However, that causes the whole file to be bufferred into memory, and that becomes a problem for large (500MB) files. 但是,这会导致整个文件被缓冲到内存中,这对于大型(500MB)文件来说就成了问题。

I've looked around for other functions that'll let me just buffer part of the file instead, but I haven't found any documentation specifically stating how the buffer works for those functions (I'm a bit new at this so maybe I'm missing the obvious). 我已经四处查找了其他函数,它们只是让我只是缓冲部分文件,但是我没有找到任何文档专门说明缓冲区如何为这些函数工作(我有点新,所以也许我我错过了明显的事情。

So far the best match I seem to have found is ReadFile. 到目前为止,我似乎找到的最佳匹配是ReadFile。 It seems to have a definable buffer but I'm not completely sure that there won't be another buffer implemented behind the scenes, like there is with CreateFileW. 它似乎有一个可定义的缓冲区,但我不完全确定在幕后不会有另外的缓冲区,就像CreateFileW一样。

Do you guys have any input on what would be a good function to use? 你们对什么是好的功能有任何意见吗?

You could use memory mapped files to do this. 您可以使用内存映射文件来执行此操作。 open with createFile , use createFileMapping then MapViewOfFile to get a pointer to the data. 与打开createFile ,采用createFileMapping然后MapViewOfFile得到的指针数据。

Not sure what you mean by CreateFile buffering - CreateFile won't read in the entire contents of the file, and besides, you need to call CreateFile before you can call ReadFile. 不确定CreateFile缓冲是什么意思 - CreateFile不会读取文件的全部内容,此外,你需要在调用ReadFile之前调用CreateFile。

ReadFile will do what you want - the OS may do some read ahead of data to opportunisticly cache data, but it will not read the entire 500 MB of file in. ReadFile会做你想做的事情 - 操作系统可以在数据之前做一些读取机会性地缓存数据,但它不会读取整个500 MB的文件。

If you really want to have no buffering, pass FILE_FLAG_NO_BUFFERING to CreateFile, and ensure that your file accesses are a multiple of volume sector size. 如果您确实不想进行缓冲,请将FILE_FLAG_NO_BUFFERING传递给CreateFile,并确保您的文件访问是卷扇区大小的倍数。 I strongly suggest you do not do this - the system file cache exists for a reason and helps with performance. 我强烈建议你不要这样做 - 系统文件缓存存在是有原因的并且有助于提高性能。 Caching files in memory should have no effect on the overall system's memory usage - under memory pressure the system file cache will shrink. 在内存中缓存文件应该不会影响整个系统的内存使用情况 - 在内存压力下,系统文件缓存会缩小。

As others have mentioned, you can use memory mapped files as well. 正如其他人所提到的,您也可以使用内存映射文件。 The difference between memory mapped files and ReadFile is mainly just the interface - ultimately the file manager will satisfy the requests in a similar manner, including some buffering. 内存映射文件和ReadFile之间的区别主要只是接口 - 最终文件管理器将以类似的方式满足请求,包括一些缓冲。 The interface appears to be a bit more intuitive, but be aware that any errors that occur will result in an exception that will need to be caught otherwise it will crash your program. 界面看起来更直观一些,但要注意发生的任何错误都会导致需要捕获的异常,否则会导致程序崩溃。

Calling CreateFile() does not itself buffer or otherwise read the contents of the target file. 调用CreateFile()本身不会缓冲或以其他方式读取目标文件的内容。 After calling CreateFile(), you must call ReadFile() to obtain whatever parts of the file you want, for example to read the first kilobyte of a file: 调用CreateFile()之后,必须调用ReadFile()来获取所需文件的任何部分,例如读取文件的第一个千字节:

DWORD cbRead;
BYTE buffer[1024];
HANDLE hFile = ::CreateFile(filename,
                            GENERIC_READ,
                            FILE_SHARE_READ,
                            NULL,
                            OPEN_EXISTING,
                            FILE_ATTRIBUTE_NORMAL,
                            NULL);
::ReadFile(hFile, sizeof(buffer), &cbRead, NULL);
::CloseHandle(hFile);

In addition, if you want to read a random portion of the file, you can use SetFilePointer() before calling ReadFile(), for example to read one kilobyte starting one megabyte into the file: 此外,如果要读取文件的随机部分,可以在调用ReadFile()之前使用SetFilePointer (),例如读取一千兆字节,从文件中开始一兆字节:

DWORD cbRead;
BYTE buffer[1024];
HANDLE hFile = ::CreateFile(filename,
                            GENERIC_READ,
                            FILE_SHARE_READ,
                            NULL,
                            OPEN_EXISTING,
                            FILE_ATTRIBUTE_NORMAL,
                            NULL);
::SetFilePointer(hFile, 1024 * 1024, NULL, FILE_BEGIN);
::ReadFile(hFile, sizeof(buffer), &cbRead, NULL);
::CloseHandle(hFile);

You may, of course, call SetFilePointer() and ReadFile() as many times as you wish while the file is open. 当然,您可以在文件打开时多次调用SetFilePointer()和ReadFile()。 A call to ReadFile() implicitly sets the file pointer to the byte immediately following the last byte read by ReadFile(). 对ReadFile()的调用隐式地将文件指针设置为紧跟在ReadFile()读取的最后一个字节之后的字节。

Additionally, you should read the documentation for the File Management Functions you use, and check the return values appropriately to trap any errors that might occur. 此外,您应阅读所使用的文件管理功能的文档,并相应地检查返回值以捕获可能发生的任何错误。

Windows may, at its discretion, use available system memory to cache the contents of open files, but data cached by this process will be discarded if the memory is needed by a running program (after all, the cached data can just be re-read from the disk if it is needed). Windows可以自行决定使用可用的系统内存来缓存打开文件的内容,但如果正在运行的程序需要内存,则该进程缓存的数据将被丢弃(毕竟,缓存的数据可以重新读取)如果需要,从磁盘)。

我相信你想要MapViewOfFile

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM