I am reading a file and saving the data into a vector
. I cannot use arrays
because the data size is not fixed. The file size is about 300kb and could go up to 600kb. Currently, this takes about 5 - 8 seconds to read/save.
I would like to know what is slowing down my read/copy method and how it might be improved?
Sample data:
0000:4000 94 45 30 39 36 39 74 00 00 00 00 50 00 00 00 27 some other info here
int SomeClass::Open ()
{
vector <unsigned int> memory; // where the data will be stored
file.open("c:\\file.txt",ios::in);
regex addressPattern("0000:(\\d|[a-z]){4}"); // used to extract the address from a string
regex dataPattern("( (\\d|[a-z]){2}){16}"); // used to extract the data from a string
smatch match;
string str; // where each line will be stored
string data; // where the data found in each line will be stored
int firstAddress = -1; // -1 = address not been found
unsigned int sector = 0;
unsigned int address = 0;
while(getline(file,str)){
if(regex_search(str,match,addressPattern) && firstAddress == -1){
sector = std::stoul(match.str().substr(0,3),nullptr,16);
address = std::stoul(match.str().substr(5),nullptr,16);
firstAddress = address;
}
if(regex_search(str,match,dataPattern)){
std::istringstream stream(str);
string data; // used to store individual byte from dataString
while(stream >> data){
unsigned int c = std::stoul(data,nullptr,16); // convertion from hex to dec
memory.insert(memory.end(),c);
}
}
}
return 0;
}
This seems as expected. Use Boost::Progress
or ctime
to isolate the costly instructions.
Vectors are implemented with contiguous memory in the manner of arrays, so you shouldn't see much (if any) slowdown there. File IO time is probably minimal on a 600kb file--I'd imagine that it's cached to memory on open. You can cache the entire file to memory with the ios::binary
mode flag for file.open but you will have to deserialize each line--the cost of the getline abstraction.
All that said, the compiler is pretty good at optimizing IO and vectors. The bottleneck is probably construction of the regexes (and perhaps even the regex match), which are necessary & complex. A Deterministic Finite State automata much be generated for each regex: What's the Time Complexity of Average Regex algorithms? .
Regex's are very powerful, but complex and slow.
Since your format is fully static (fixed number of digits and fixed separators inbetween) you could implement the conversion yourself, reading char by char. This will not be very complex.
For instance, to read all hex numbers, and check spaces and semicolon:
while(getline(file,str))
{
if(str.size()>=57)
{
int sector = hexToInt(str.data(), 4);
int address = hexToInt(str.data()+5, 4);
bool ok = ok && (sector==0) && (address>=0);
ok = ok && str[4] == ':';
int bytes[16];
for(int i=0;i<16;++i)
{
bytes[i] = hexToInt(str.data()+10+3*i, 2);
ok = ok && (str[9+3*i]==' ') && (bytes[i]>=0);
}
}
//Etc...
}
Function for checking and converting a hex digit:
int hexCharToDigit(char c)
{
if(c>='0' && c<='9')
{
//Decimal digit
return (int)(c-'0');
}
else if (str[i]>='a' && str[i]<='f')
{
//Hexadecimal lower case letter
return (int)(c-'a')+10;
}
else if (str[i]>='A' && str[i]<='F')
{
//Hexadecimal upper case letter
return (int)(c-'A')+10;
}
else
{
//Char is not a hex digit
return -1;
}
}
Function for checking and converting a n-digit hex to int:
int hexToInt(const char * chr, int size)
{
assert(size<8);
int result= 0;
for(int i=0;i<size;++i)
{
int hexDigit = hexCharToDigit(chr[i]);
if(hexDigit>=0)
{
//Valid hexadecimal digit
result = result << 4;
result += hexDigit;
}
else
{
//Char is not a hex digit as expected
return -1;
}
}
return result;
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.