简体   繁体   中英

Slow file reading and copying into memory - C++

I am reading a file and saving the data into a vector . I cannot use arrays because the data size is not fixed. The file size is about 300kb and could go up to 600kb. Currently, this takes about 5 - 8 seconds to read/save.

I would like to know what is slowing down my read/copy method and how it might be improved?

Sample data:

0000:4000 94 45 30 39 36 39 74 00 00 00 00 50 00 00 00 27 some other info here

int SomeClass::Open () 
{

    vector <unsigned int> memory; // where the data will be stored
    file.open("c:\\file.txt",ios::in);
    regex addressPattern("0000:(\\d|[a-z]){4}"); // used to extract the address from a string
    regex dataPattern("( (\\d|[a-z]){2}){16}"); // used to extract the data from a string
    smatch match;
    string str; // where each line will be stored
    string data; // where the data found in each line will be stored
    int firstAddress = -1; // -1 = address not been found
    unsigned int sector = 0;
    unsigned int address = 0;
    while(getline(file,str)){

         if(regex_search(str,match,addressPattern) && firstAddress == -1){ 
             sector = std::stoul(match.str().substr(0,3),nullptr,16);
             address = std::stoul(match.str().substr(5),nullptr,16);
             firstAddress = address;
         }
         if(regex_search(str,match,dataPattern)){
            std::istringstream stream(str);
            string data; // used to store individual byte from dataString
            while(stream >> data){
                unsigned int c = std::stoul(data,nullptr,16); // convertion from hex to dec
                memory.insert(memory.end(),c);
            }
         }
    }

    return 0;

}

This seems as expected. Use Boost::Progress or ctime to isolate the costly instructions.

Vectors are implemented with contiguous memory in the manner of arrays, so you shouldn't see much (if any) slowdown there. File IO time is probably minimal on a 600kb file--I'd imagine that it's cached to memory on open. You can cache the entire file to memory with the ios::binary mode flag for file.open but you will have to deserialize each line--the cost of the getline abstraction.

All that said, the compiler is pretty good at optimizing IO and vectors. The bottleneck is probably construction of the regexes (and perhaps even the regex match), which are necessary & complex. A Deterministic Finite State automata much be generated for each regex: What's the Time Complexity of Average Regex algorithms? .

Regex's are very powerful, but complex and slow.

Since your format is fully static (fixed number of digits and fixed separators inbetween) you could implement the conversion yourself, reading char by char. This will not be very complex.

For instance, to read all hex numbers, and check spaces and semicolon:

while(getline(file,str))
{
    if(str.size()>=57)
    {
        int sector = hexToInt(str.data(), 4);
        int address = hexToInt(str.data()+5, 4);

        bool ok = ok && (sector==0) && (address>=0);

        ok = ok && str[4] == ':';

        int bytes[16];
        for(int i=0;i<16;++i)
        {
            bytes[i] = hexToInt(str.data()+10+3*i, 2);
            ok = ok && (str[9+3*i]==' ') && (bytes[i]>=0);
        }
    }

    //Etc...
}

Function for checking and converting a hex digit:

int hexCharToDigit(char c)
{
    if(c>='0' && c<='9')
    {
        //Decimal digit
        return (int)(c-'0');
    }
    else if (str[i]>='a' && str[i]<='f')
    {
        //Hexadecimal lower case letter
        return (int)(c-'a')+10;
    }
    else if (str[i]>='A' && str[i]<='F')
    {
        //Hexadecimal upper case letter
        return (int)(c-'A')+10;
    }
    else
    {
        //Char is not a hex digit
        return -1;
    }  
}

Function for checking and converting a n-digit hex to int:

int hexToInt(const char * chr, int size)
{
    assert(size<8);

    int result= 0;
    for(int i=0;i<size;++i)
    {
        int hexDigit = hexCharToDigit(chr[i]);
        if(hexDigit>=0)
        {
            //Valid hexadecimal digit
            result = result << 4;
            result += hexDigit;
        }
        else
        {
            //Char is not a hex digit as expected
            return -1;
        }   
    }

    return result;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM