简体   繁体   中英

Reading from a large text file into a structure array in Qt?

I have to read a text file into a array of structures.I have already written a program but it is taking too much time as there are about 13 lac structures in the file. Please suggest me the best possible and fastest way to do this in C++.

here is my code:

std::ifstream input_counter("D:\\cont.txt");

/**********************************************************/
int counter = 0;
while( getline(input_counter,line) )
{
    ReadCont( line,&contract[counter]); // function to read data to structure
    counter++;
    line.clear();
}
input_counter.close();

I would use Qt entirely in this case.

struct MyStruct {
    int Col1;
    int Col2;
    int Col3;
    int Col4;
    // blabla ...
};

QByteArray Data;
QFile f("D:\\cont.txt");
if (f.open(QIODevice::ReadOnly)) {
    Data = f.readAll();
    f.close();
}

MyStruct* DataPointer = reinterpret_cast<MyStruct*>(Data.data());
// Accessing data
DataPointer[0] = ...
DataPointer[1] = ...

Now you have your data and you can access it as array.

In case your data is not binary and you have to parse it first you will need a conversion routine. For example if you read csv file with 4 columns:

QVector<MyStruct> MyArray;
QString StringData(Data);
QStringList Lines = StringData.split("\n"); // or whatever new line character is
for (int i = 0; i < Lines.count(); i++) {
    String Line = Lines.at(i);
    QStringList Parts = Line.split("\t"); // or whatever separator character is
    if (Parts.count() >= 4) {
        MyStruct t;
        t.Col1 = Parts.at(0).toInt();
        t.Col2 = Parts.at(1).toInt();
        t.Col3 = Parts.at(2).toInt();
        t.Col4 = Parts.at(3).toInt();
        MyArray.append(t);
    } else { 
        // Malformed input, do something
    }
}

Now your data is parsed and in MyArray vector.

keep your 'parsing' as simple as possible: where you know the field' format apply the knowledge, for instance

ReadCont("|PE|1|0|0|0|0|1|1||2|0||2|0||3|0|....", ...)

should apply fast char to integer conversion, something like

ReadCont(const char *line, Contract &c) {
   if (line[1] == 'P' && line[2] == 'E' && line[3] == '|') {
     line += 4;
     for (int field = 0; field < K_FIELDS_PE; ++field) {
       c.int_field[field] = *line++ - '0';
       assert(*line == '|');
       ++line;
     }
   }

well, beware to details, but you got the idea...

As user2617519 says, this can be made faster by multithreading. I see that you are reading each line and parsing it. Put these lines in a queue. Then let different threads pop them off the queue and parse the data into structures.
An easier way to do this (without the complication of multithreading) is to split the input data file into multiple files and run an equal number of processes to parse them. The data can then be merged later.

QFile::readAll() may cause a memory problem and std::getline() is slow (as is ::fgets() ).

I faced a similar problem where I needed to parse very large delimited text files in a QTableView . Using a custom model, I parsed the file to find the offsets to the start of a each line. Then when data is needed to display in the table I read the line and parse it on demand. This results in a lot of parsing, but that is actually fast enough to not notice any lag in scrolling or update speed.

It also has the added benefit of low memory usage as I do not read the file contents into memory. With this strategy nearly any size file is possible.

Parsing code:

m_fp = ::fopen(path.c_str(), "rb"); // open in binary mode for faster parsing
if (m_fp != NULL)
{
  // read the file to get the row pointers
  char buf[BUF_SIZE+1];

  long pos = 0;
  m_data.push_back(RowData(pos));
  int nr = 0;
  while ((nr = ::fread(buf, 1, BUF_SIZE, m_fp)))
  {
    buf[nr] = 0; // null-terminate the last line of data
    // find new lines in the buffer
    char *c = buf;
    while ((c = ::strchr(c, '\n')) != NULL)
    {
      m_data.push_back(RowData(pos + c-buf+1));
      c++;
    }
    pos += nr;
  }

  // squeeze any extra memory not needed in the collection
  m_data.squeeze();
}

RowData and m_data are specific to my implementation, but they are simply used to cache information about a row in the file (such as the file position and number of columns).

The other performance strategy I employed was to use QByteArray to parse each line, instead of QString . Unless you need unicode data, this will save time and memory:

// optimized line reading procedure
QByteArray str;
char buf[BUF_SIZE+1];
::fseek(m_fp, rd.offset, SEEK_SET);
int nr = 0;
while ((nr = ::fread(buf, 1, BUF_SIZE, m_fp)))
{
  buf[nr] = 0; // null-terminate the string
  // find new lines in the buffer
  char *c = ::strchr(buf, '\n');
  if (c != NULL)
  {
    *c = 0;
    str += buf;
    break;
  }
  str += buf;
}

return str.split(',');

If you need to split each line with a string, rather than a single character, use ::strtok() .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM