简体   繁体   English

使用(FILE *)读取大型文本文件后发生垃圾

[英]Garbage after reading a large text file using (FILE*)

I have a sample program written in C/C++, which reads a text file into memory. 我有一个用C / C ++编写的示例程序,该程序将文本文件读入内存。 While trying to parse this file (not part of this sample), I came across lots of garbage near end of file. 在尝试解析此文件(不是本示例的一部分)时,我在文件末尾遇到了很多垃圾。 Investigating this I found there is some problem while reading large files into memory; 经过调查,我发现将大文件读入内存时出现了一些问题。 this problem does not happen with small sized text files. 小型文本文件不会发生此问题。 Here is my code: 这是我的代码:

#include <conio.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

using namespace std;

char* readFile_(char* fname)
{
    char* rv=NULL;
    int bytes=0;
    FILE* pfile = NULL;
    pfile = fopen( fname, "r" );
    if ( pfile )
    {
        fseek(pfile, 0, SEEK_END);
        bytes = ftell(pfile);
        fseek(pfile, 0, SEEK_SET);
        rv = new char[bytes+1];
        memset(rv,0,bytes+1);
        fread( rv, bytes, 1, pfile );
        fclose(pfile);
    }
    return rv;
}

int main(int argc, char **argv)
{
    char* filebuffer = NULL;
    filebuffer = readFile_( "mv2.txt" );

    FILE* pfile = fopen("op.txt", "w");
    int len = strlen(filebuffer);
    fwrite( filebuffer, len, 1, pfile );
    fclose(pfile);

    delete[] filebuffer;
    return 0;
}

For reference, files are hosted here: 作为参考,文件托管在这里:

mv2.txt file: https://gist.github.com/anonymous/bb101393729d3ada944f mv2.txt文件: https ://gist.github.com/anonymous/bb101393729d3ada944f
op.txt file: https://gist.github.com/anonymous/93595c83ad62e40d0f0a op.txt文件: https ://gist.github.com/anonymous/93595c83ad62e40d0f0a

Can anyone highlight what seems to be the problem? 谁能强调似乎是问题所在?

Edit: I am using Windows (Windows 7 OS) 编辑:我正在使用Windows(Windows 7 OS)

Edit 2: Thanks everyone for helping me find the problem, here is the updated code based on few of your feedbacks that seems to solve my problem even for some very large text files: 编辑2:谢谢大家帮助我发现问题,这是基于您的一些反馈的更新代码,即使对于一些非常大的文本文件,这些反馈也似乎可以解决我的问题:

#include <conio.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

using namespace std;

char* readFile_(char* fname)
{
    char* rv=NULL;
    long bytes=0;
    FILE* pfile = NULL;
    pfile = fopen( fname, "rb" );
    if ( pfile )
    {
        fseek(pfile, 0, SEEK_END);
        bytes = ftell(pfile);
        fseek(pfile, 0, SEEK_SET);
        rv = new char[bytes+1];
        memset(rv,0,bytes+1);
        fread( rv, bytes, 1, pfile );
        fclose(pfile);
    }
    return rv;
}

int main(int argc, char **argv)
{
    char* filebuffer = NULL;
    filebuffer = readFile_( "mv2.txt" );

    FILE* pfile = fopen("op.txt", "wb");
    int len = strlen(filebuffer);
    fwrite( filebuffer, len, 1, pfile );
    fclose(pfile);

    delete[] filebuffer;
    return 0;
}

By including conio.h, I can make the assumption you are on Windows. 通过包含conio.h,我可以假设您使用的是Windows。 In Windows, you will encounter problems in situations like this without using binary mode reads. 在Windows中,在不使用二进制模式读取的情况下,您会遇到此类问题。 I would try opening the file with "rb" as the mode. 我将尝试以“ rb”作为模式打开文件。

FILE* pfile = NULL;
pfile = fopen( fname, "rb" );

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM