简体   繁体   English

使用 ReadFile 时,一半的读取缓冲区损坏

[英]Half of read buffer is corrupt when using ReadFile

Half of the buffer used with ReadFile is corrupt.与 ReadFile 一起使用的一半缓冲区已损坏。 Regardless of the size of the buffer, half of it has the same corrupted character.无论缓冲区的大小如何,其中一半都具有相同的损坏字符。 I have look for anything that could be causing the read to stop early, etc. If I increase the size of the buffer, I see more of the file so it is not failing on a particular part of the file.我已经寻找任何可能导致读取提前停止的东西,等等。如果我增加缓冲区的大小,我会看到更多的文件,所以它不会在文件的特定部分失败。

Visual Studio 2019. Windows 10. Visual Studio 2019。Windows 10。

#define MAXBUFFERSIZE 1024
DWORD bufferSize = MAXBUFFERSIZE;
_int64 fileRemaining;

HANDLE hFile;
DWORD  dwBytesRead = 0;
//OVERLAPPED ol = { 0 };
LARGE_INTEGER dwPosition;

TCHAR* buffer;

hFile = CreateFile(
    inputFilePath,         // file to open
    GENERIC_READ,          // open for reading
    FILE_SHARE_READ,       // share for reading
    NULL,                  // default security
    OPEN_EXISTING,         // existing file only
    FILE_ATTRIBUTE_NORMAL, // normal file    | FILE_FLAG_OVERLAPPED
    NULL);                 // no attr. template

if (hFile == INVALID_HANDLE_VALUE)
{
    DisplayErrorBox((LPWSTR)L"CreateFile");
    return 0;
}

LARGE_INTEGER size;
GetFileSizeEx(hFile, &size);

_int64 fileSize = (__int64)size.QuadPart;
double gigabytes = fileSize * 9.3132e-10;
sendToReportWindow(L"file size: %lld bytes \(%.1f gigabytes\)\n", fileSize, gigabytes);

if(fileSize > MAXBUFFERSIZE)
{
    buffer = new TCHAR[MAXBUFFERSIZE];
}
else
{
    buffer = new TCHAR[fileSize];
}
fileRemaining = fileSize;

sendToReportWindow(L"file remaining: %lld bytes\n", fileRemaining);

while (fileRemaining)                                       // outer loop. while file remaining, read file chunk to buffer
{
    sendToReportWindow(L"fileRemaining:%d\n", fileRemaining);

    if (bufferSize > fileRemaining)                         // as fileremaining gets smaller as file is processed, it eventually is smaller than the buffer
        bufferSize = fileRemaining;

    if (FALSE == ReadFile(hFile, buffer, bufferSize, &dwBytesRead, NULL))
    {
        sendToReportWindow(L"file read failed\n");
        CloseHandle(hFile);
        return 0;
    }

    fileRemaining -= bufferSize;

 // bunch of commented out code (verified that it does not cause the corruption)
}
delete [] buffer;

Debugger html view (512 byte buffer)调试器 html 查看(512字节缓冲区) 512 字节缓冲区

Debugger html view (1024 byte buffer).调试器 html 视图(1024 字节缓冲区)。 This shows that file is probably not the source of the corruption.这表明文件可能不是损坏的来源。 1025 字节缓冲区

Misc notes: I have been told that memory mapping the file does not provide an advantage since I am sequentially processing the file.杂项说明:有人告诉我,映射文件的 memory 没有提供优势,因为我是按顺序处理文件的。 Another advantage to this method is that when I detect particular and reoccurring tags in the WARC file I can skip ahead ~500 bytes and resume processing.这种方法的另一个优点是,当我在 WARC 文件中检测到特定的和重复出现的标签时,我可以向前跳过约 500 个字节并恢复处理。 This improves speed.这提高了速度。

The reason is that you use a buffer array of type TCHAR , and the size of TCHAR type is 2 bytes.原因是您使用了TCHAR类型的缓冲区数组,而TCHAR类型的大小为 2 个字节。 So the bufferSize set when you call the ReadFile function is actually filled into the buffer array every 2 bytes.所以调用ReadFile function时设置的bufferSize实际上每2个字节填充到buffer数组中。

But the actual size of the buffer is sizeof(TCHAR) * fileSize , so half of the buffer array you see is "corrupted"但是缓冲区的实际大小是sizeof(TCHAR) * fileSize ,所以你看到的缓冲区数组的一半是“损坏的”

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM