简体   繁体   English

缓慢的文件读取和复制到内存-C ++

[英]Slow file reading and copying into memory - C++

I am reading a file and saving the data into a vector . 我正在读取文件并将数据保存到vector I cannot use arrays because the data size is not fixed. 我不能使用arrays因为数据大小不是固定的。 The file size is about 300kb and could go up to 600kb. 文件大小约为300kb,最大可达600kb。 Currently, this takes about 5 - 8 seconds to read/save. 目前,读取/保存大约需要5-8秒。

I would like to know what is slowing down my read/copy method and how it might be improved? 我想知道是什么导致我的读取/复制方法变慢,以及如何加以改进?

Sample data: 样本数据:

0000:4000 94 45 30 39 36 39 74 00 00 00 00 50 00 00 00 27 some other info here 0000:4000 94 45 30 39 36 39 74 00 00 00 00 50 00 00 00 27其他一些信息

int SomeClass::Open () 
{

    vector <unsigned int> memory; // where the data will be stored
    file.open("c:\\file.txt",ios::in);
    regex addressPattern("0000:(\\d|[a-z]){4}"); // used to extract the address from a string
    regex dataPattern("( (\\d|[a-z]){2}){16}"); // used to extract the data from a string
    smatch match;
    string str; // where each line will be stored
    string data; // where the data found in each line will be stored
    int firstAddress = -1; // -1 = address not been found
    unsigned int sector = 0;
    unsigned int address = 0;
    while(getline(file,str)){

         if(regex_search(str,match,addressPattern) && firstAddress == -1){ 
             sector = std::stoul(match.str().substr(0,3),nullptr,16);
             address = std::stoul(match.str().substr(5),nullptr,16);
             firstAddress = address;
         }
         if(regex_search(str,match,dataPattern)){
            std::istringstream stream(str);
            string data; // used to store individual byte from dataString
            while(stream >> data){
                unsigned int c = std::stoul(data,nullptr,16); // convertion from hex to dec
                memory.insert(memory.end(),c);
            }
         }
    }

    return 0;

}

This seems as expected. 这似乎是预期的。 Use Boost::Progress or ctime to isolate the costly instructions. 使用Boost::Progressctime隔离昂贵的指令。

Vectors are implemented with contiguous memory in the manner of arrays, so you shouldn't see much (if any) slowdown there. 向量是通过数组的方式在连续内存中实现的,因此,在那里您应该不会看到太多(如果有的话)减速。 File IO time is probably minimal on a 600kb file--I'd imagine that it's cached to memory on open. 对于600kb的文件,文件IO时间可能最短-我想它会在打开时缓存到内存中。 You can cache the entire file to memory with the ios::binary mode flag for file.open but you will have to deserialize each line--the cost of the getline abstraction. 您可以使用file.open的ios::binary模式标志将整个文件缓存到内存中,但是您必须对每行进行反序列化-getline抽象的成本。

All that said, the compiler is pretty good at optimizing IO and vectors. 综上所述,编译器非常擅长优化IO和向量。 The bottleneck is probably construction of the regexes (and perhaps even the regex match), which are necessary & complex. 瓶颈可能是正则表达式(甚至正则表达式匹配)的构造,这是必要且复杂的。 A Deterministic Finite State automata much be generated for each regex: What's the Time Complexity of Average Regex algorithms? 每个正则表达式都会生成确定性有限状态自动机: 平均正则表达式算法的时间复杂度是多少? .

Regex's are very powerful, but complex and slow. 正则表达式功能强大,但复杂且缓慢。

Since your format is fully static (fixed number of digits and fixed separators inbetween) you could implement the conversion yourself, reading char by char. 由于您的格式是完全静态的(固定的数字位数和中间的固定分隔符),因此您可以自己实现转换,逐字符读取char。 This will not be very complex. 这不会很复杂。

For instance, to read all hex numbers, and check spaces and semicolon: 例如,要读取所有十六进制数字,并检查空格和分号:

while(getline(file,str))
{
    if(str.size()>=57)
    {
        int sector = hexToInt(str.data(), 4);
        int address = hexToInt(str.data()+5, 4);

        bool ok = ok && (sector==0) && (address>=0);

        ok = ok && str[4] == ':';

        int bytes[16];
        for(int i=0;i<16;++i)
        {
            bytes[i] = hexToInt(str.data()+10+3*i, 2);
            ok = ok && (str[9+3*i]==' ') && (bytes[i]>=0);
        }
    }

    //Etc...
}

Function for checking and converting a hex digit: 用于检查和转换十六进制数字的功能:

int hexCharToDigit(char c)
{
    if(c>='0' && c<='9')
    {
        //Decimal digit
        return (int)(c-'0');
    }
    else if (str[i]>='a' && str[i]<='f')
    {
        //Hexadecimal lower case letter
        return (int)(c-'a')+10;
    }
    else if (str[i]>='A' && str[i]<='F')
    {
        //Hexadecimal upper case letter
        return (int)(c-'A')+10;
    }
    else
    {
        //Char is not a hex digit
        return -1;
    }  
}

Function for checking and converting a n-digit hex to int: 用于将n位十六进制并将其转换为int的函数:

int hexToInt(const char * chr, int size)
{
    assert(size<8);

    int result= 0;
    for(int i=0;i<size;++i)
    {
        int hexDigit = hexCharToDigit(chr[i]);
        if(hexDigit>=0)
        {
            //Valid hexadecimal digit
            result = result << 4;
            result += hexDigit;
        }
        else
        {
            //Char is not a hex digit as expected
            return -1;
        }   
    }

    return result;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM