Why transposing this std::vector<std::vector<std::string> > is so slow?

Question

I have a 1000-lines file of about 400MB representing some numeric data represented as string. I want to transpose the data in order to have only 1000 strings per line, (so that I can open it and plot it fast with pandas).

I imported the whole file in a vector of vector of string that I want to transpose (and eventually I want to write back to file).

I use two nested loops to go through the 2d structure and I write it into some std::ofstream. It is very long. Then I tried to focus on the transposition and I wrote the following code:

//Read 400MB file, 90K strings per line and 1K lines, and store it into
std::vector<std::vector<std::string>> mData;

// ... 
// IO the file and populate mData with raw data 
// ...

//All rows have same number of string
size_t nbRows = mData.size();
size_t nbCols = mData[0].size();

std::vector<std::vector<std::string> > transposedData(nbCols);
for(size_t i = 0 ; i < nbCols  ; ++i)
{
    transposedData[i].resize(nbRows);
    for(size_t j = 0 ; j < nbRows ; ++j)
    {
        transposedData[i][j] = doc.mData[j][i];
    }
}

I thought a few seconds would be sufficient, but it takes several minutes. Also, I am trying with different input dimensions (only 3 lines and much more strings per line, for a same file size of 400MB) and it's much faster.

EDIT 1

On people advice, I performed the profiling using callgrind. I got this message during the process: ... brk segment overflow in thread #1: can't grow to...

I analysed the result and summarize it here:
25 % is spent in operator= of basic_string
21 % is spent in construction of basic_string (with only 3% of time in new)
14 % is spent in operator()[] on the outside vector
11 % in spent in operator()[] on the inside vector

Thank you for your suggestions.

Answer 1

First of all before making any claim on the reason why a piece of code is slow you should really measure its performance on your machine and then with data at hand deduce why .

That said I am quite confident in this case in saying that the problem might reside in the fact that you are allocating 90k vectors of string each of them of size 1k . As you know memory allocation is costly, and it might explain your performance penalty.

The following is how you could implement your code using only a 1D array allocated upfront.

size_t nbRows = mData.size();
size_t nbCols = mData[0].size();

auto get_idx = [](const int i, const int nr, const int j)
{
    return i*nr+j;
};

std::vector<std::string> transposedData(nbCols*nbRows);  
for(size_t i = 0 ; i < nbCols  ; ++i)
{
    for(size_t j = 0 ; j < nbRows ; ++j)
    {
        const int idx = get_idx(j, nbCols,i);
        transposedData[idx] = std::move(mData[j][i]);
    }
}

for(size_t i = 0 ; i < nbCols  ; ++i)
{
    for(size_t j = 0 ; j < nbRows ; ++j)
    {
        const int idx = get_idx(j, nbCols,i);
        cout<<transposedData[idx]<<" ";
    }
    cout<<endl;
}

I would like to stress it again: profile your code. Try out software like valgrind --tool= callgrind or gprof allowing you to profile and visualize performance data about your app.

Answer 2

The program has redundancies on several levels.

The obvious one is that you don't need to transpose the vector in order to transpose the file.

vector<vector<string> originalData;
// read the file to originalData

for(size_t i = 0 ; i < nbCols  ; ++i)
{
    for(size_t j = 0 ; j < nbRows ; ++j)
    {
        cout << originalData[j][i] << " ";
    }
    cout<<endl;
}

Assuming you do need to produce a transposed vector for some reason, one way to write the transpose loop would be

vector<vector<string>> transposedData (nbCols);
for (size_t j = 0; j < nbCols; ++j)
{
    transposedData[j].reserve(nrows);
    for (size_t i = 0; i < nbRows; ++i) 
    {
        transposedData[j].emplace_back(originalData[i][j]);
        // if keeping original veector is not needed ...
        // transposedData[j].emplace_back(std::move(originalData[i][j]));
    }
}

On my (fairly beefy) machine it takes about 6-7 seconds to transpose a 1000x90000 matrix of 3-character strings. This is not particularly impressive, if you don't need to transpose multi-million-element matrices 24 hours a day, it does what you need without too much overhead.

Answer 3

The penalty may come from the fact you're overusing resize on your for loop.

As per the reference :

Complexity

Linear in the difference between the current size and count. Additional complexity possible due to reallocation if capacity is less than count

Memory allocation costs a lot, so you may want to avoid overdoing it.

As pointed by others, upfront allocation would be an interesting approach to avoid recreating (resizing) your vector every time.

Answer 4

There is not enough available memory on my machine to perform this task (see below). Splitting my data in three parts, I solved the task in a few seconds. Here is the output of a code scrutinizing memory:

 free ram 2.5GB IO populating mData with raw data free ram 0.2GB Empty string capacity: 15 bytes Intending to allocate 1.4 GB terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc Aborted

Why transposing this std::vector<std::vector<std::string> > is so slow?

Question

4 answers

solution1
1 2019-10-03 09:13:59

solution2
1 2019-10-03 12:35:21

solution3
0 2019-10-03 09:21:09

solution4
0 ACCPTED 2019-10-03 10:31:44

Why transposing this std::vector<std::vector<std::string> > is so slow?

Question

4 answers

solution1 1 2019-10-03 09:13:59

solution2 1 2019-10-03 12:35:21

solution3 0 2019-10-03 09:21:09

solution4 0 ACCPTED 2019-10-03 10:31:44

solution1
1 2019-10-03 09:13:59

solution2
1 2019-10-03 12:35:21

solution3
0 2019-10-03 09:21:09

solution4
0 ACCPTED 2019-10-03 10:31:44