简体   繁体   中英

How to Multithread reading, processing and writing data in Qt C++?

I am relatively new to Qt and C++ and completely self-taught. I'm trying to multi-thread a task which I currently have working on a single thread.

My task is as follows:

  1. Read in multiple csv files line by line and store each column of data from each file into separate vectors.
  2. Process the index of each vector of data through various mathematical equations.
  3. Once each index of data is processed, write the results of the equations to an output file.

Example file being read in:

Col1,   Col2,   Col3,   Col4,  . . .  ColN
1,      A,      B,      C,     . . .  X
2,      D,      E,      F,     . . .  Y
3,      G,      H,      J,     . . .  Z
.,      .,      .,      .,     . . .  .
.,      .,      .,      .,     . . .  .
N,      .,      .,      .,     . . .  .

And here is some sudo code showing the principle:

for (int i = 0; i < N; i = i + 1)
{
    // there are multiple nested for loops, but only one shown here

    // calculate multiple variables. Here are two examples:
    calculatedVariable = Col2[i] + Col3[i] / Col4 [i];
    calculatedVariable2 = (Col2[i] * 0.98) / (Col2[i] + Col3[i] + Col4[i]) + (Col2[i] + Col3[i])
    
    // then write the calculated variables to an output text file
    output << calculatedVariable << "," << calculatedVariable2 << std::endl;
}

This works great as the code writes to the output text file at the end of each loop iteration, and so it doesn't clog up RAM (ie instead of doing all computations, storing in vectors and then writing the data out all in one go).

My problem is that these files can have hundreds of thousands of lines and processing can take a couple of hours. If I can multi-thread, such that the processing is carried out for multiple indices of data simultaneously, while maintaining the order of data in the output file, it would drastically reduce computation time. I don't need to multi-thread the reading of data at this stage.

I am currently struggling at the conceptual aspect of tackling this and can't find any similar examples online. I've look at QtConcurrent as an option but not quite sure how to apply it.

If anyone can point me in the right direction that would be appreciated. Thank you.

EDIT 1: Thanks for the responses. So the bottle-neck is the actual processing of the data through some long iterative calculations, not the IO operations. Lets say I read 2 files, each with 1000 lines. If I want to run some calculations for each line in file 1 for each line in file 2, that's 1,000,000 cases. If there was some way to split the task of those calculations across lets say 10 threads, that would cut processing time massively.

Basically, you want this. Feel free to replace the std:: mechanisms below with their Qt equivalent (QString vs std::string, etc...)

struct Job
{
   std::string inputFileName;
   std::string outputFileName;
};

std::queue<Job> jobs;

// not shown - populate jobs with the input/output names of files you want to manage

std::mutex m;

unsigned int nthreads = std::thread::hardware_concurrency();
vector<std::thread> threads;
for (unsigned int i = 0; i < nthreads; i++) {
    std::thread t = [&m, &jobs] {

        while (true) {
            Job job;
            {
                std::lock_guard<std::mutex> lck(m); // acquire the lock that protects jobs
                if jobs.empty() {
                    return;  // the queue is empty, this thread can exit
                }
                job = jobs.front();
                jobs.pop();
            }
        
            // YOUR CODE GOES HERE
            // Open the file job.inputFileName and read in the contents
            // Open the output file job.outputFileName
            // then do your processing and write to the output file handle
            // close your files

            // all done for this file - loop back to the top of this lambda function and get the next file
        }
    };

    threads.push_back(std::move(t));
}

// wait for all threads to finish
for (auto& t : threads) {
    t.join();
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM