I have to read a text file into a array of structures.I have already written a program but it is taking too much time as there are about 13 lac structures in the file. Please suggest me the best possible and fastest way to do this in C++.
here is my code:
std::ifstream input_counter("D:\\cont.txt");
/**********************************************************/
int counter = 0;
while( getline(input_counter,line) )
{
ReadCont( line,&contract[counter]); // function to read data to structure
counter++;
line.clear();
}
input_counter.close();
I would use Qt entirely in this case.
struct MyStruct {
int Col1;
int Col2;
int Col3;
int Col4;
// blabla ...
};
QByteArray Data;
QFile f("D:\\cont.txt");
if (f.open(QIODevice::ReadOnly)) {
Data = f.readAll();
f.close();
}
MyStruct* DataPointer = reinterpret_cast<MyStruct*>(Data.data());
// Accessing data
DataPointer[0] = ...
DataPointer[1] = ...
Now you have your data and you can access it as array.
In case your data is not binary and you have to parse it first you will need a conversion routine. For example if you read csv file with 4 columns:
QVector<MyStruct> MyArray;
QString StringData(Data);
QStringList Lines = StringData.split("\n"); // or whatever new line character is
for (int i = 0; i < Lines.count(); i++) {
String Line = Lines.at(i);
QStringList Parts = Line.split("\t"); // or whatever separator character is
if (Parts.count() >= 4) {
MyStruct t;
t.Col1 = Parts.at(0).toInt();
t.Col2 = Parts.at(1).toInt();
t.Col3 = Parts.at(2).toInt();
t.Col4 = Parts.at(3).toInt();
MyArray.append(t);
} else {
// Malformed input, do something
}
}
Now your data is parsed and in MyArray
vector.
keep your 'parsing' as simple as possible: where you know the field' format apply the knowledge, for instance
ReadCont("|PE|1|0|0|0|0|1|1||2|0||2|0||3|0|....", ...)
should apply fast char to integer conversion, something like
ReadCont(const char *line, Contract &c) {
if (line[1] == 'P' && line[2] == 'E' && line[3] == '|') {
line += 4;
for (int field = 0; field < K_FIELDS_PE; ++field) {
c.int_field[field] = *line++ - '0';
assert(*line == '|');
++line;
}
}
well, beware to details, but you got the idea...
As user2617519 says, this can be made faster by multithreading. I see that you are reading each line and parsing it. Put these lines in a queue. Then let different threads pop them off the queue and parse the data into structures.
An easier way to do this (without the complication of multithreading) is to split the input data file into multiple files and run an equal number of processes to parse them. The data can then be merged later.
QFile::readAll()
may cause a memory problem and std::getline()
is slow (as is ::fgets()
).
I faced a similar problem where I needed to parse very large delimited text files in a QTableView
. Using a custom model, I parsed the file to find the offsets to the start of a each line. Then when data is needed to display in the table I read the line and parse it on demand. This results in a lot of parsing, but that is actually fast enough to not notice any lag in scrolling or update speed.
It also has the added benefit of low memory usage as I do not read the file contents into memory. With this strategy nearly any size file is possible.
Parsing code:
m_fp = ::fopen(path.c_str(), "rb"); // open in binary mode for faster parsing
if (m_fp != NULL)
{
// read the file to get the row pointers
char buf[BUF_SIZE+1];
long pos = 0;
m_data.push_back(RowData(pos));
int nr = 0;
while ((nr = ::fread(buf, 1, BUF_SIZE, m_fp)))
{
buf[nr] = 0; // null-terminate the last line of data
// find new lines in the buffer
char *c = buf;
while ((c = ::strchr(c, '\n')) != NULL)
{
m_data.push_back(RowData(pos + c-buf+1));
c++;
}
pos += nr;
}
// squeeze any extra memory not needed in the collection
m_data.squeeze();
}
RowData
and m_data
are specific to my implementation, but they are simply used to cache information about a row in the file (such as the file position and number of columns).
The other performance strategy I employed was to use QByteArray
to parse each line, instead of QString
. Unless you need unicode data, this will save time and memory:
// optimized line reading procedure
QByteArray str;
char buf[BUF_SIZE+1];
::fseek(m_fp, rd.offset, SEEK_SET);
int nr = 0;
while ((nr = ::fread(buf, 1, BUF_SIZE, m_fp)))
{
buf[nr] = 0; // null-terminate the string
// find new lines in the buffer
char *c = ::strchr(buf, '\n');
if (c != NULL)
{
*c = 0;
str += buf;
break;
}
str += buf;
}
return str.split(',');
If you need to split each line with a string, rather than a single character, use ::strtok()
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.