简体   繁体   中英

Why transposing this std::vector<std::vector<std::string> > is so slow?

I have a 1000-lines file of about 400MB representing some numeric data represented as string. I want to transpose the data in order to have only 1000 strings per line, (so that I can open it and plot it fast with pandas).

I imported the whole file in a vector of vector of string that I want to transpose (and eventually I want to write back to file).

I use two nested loops to go through the 2d structure and I write it into some std::ofstream. It is very long. Then I tried to focus on the transposition and I wrote the following code:

//Read 400MB file, 90K strings per line and 1K lines, and store it into
std::vector<std::vector<std::string>> mData;

// ... 
// IO the file and populate mData with raw data 
// ...

//All rows have same number of string
size_t nbRows = mData.size();
size_t nbCols = mData[0].size();

std::vector<std::vector<std::string> > transposedData(nbCols);
for(size_t i = 0 ; i < nbCols  ; ++i)
{
    transposedData[i].resize(nbRows);
    for(size_t j = 0 ; j < nbRows ; ++j)
    {
        transposedData[i][j] = doc.mData[j][i];
    }
}

I thought a few seconds would be sufficient, but it takes several minutes. Also, I am trying with different input dimensions (only 3 lines and much more strings per line, for a same file size of 400MB) and it's much faster.

EDIT 1

On people advice, I performed the profiling using callgrind. I got this message during the process: ... brk segment overflow in thread #1: can't grow to...

I analysed the result and summarize it here:
25 % is spent in operator= of basic_string
21 % is spent in construction of basic_string (with only 3% of time in new)
14 % is spent in operator()[] on the outside vector
11 % in spent in operator()[] on the inside vector

Thank you for your suggestions.

First of all before making any claim on the reason why a piece of code is slow you should really measure its performance on your machine and then with data at hand deduce why .

That said I am quite confident in this case in saying that the problem might reside in the fact that you are allocating 90k vectors of string each of them of size 1k . As you know memory allocation is costly, and it might explain your performance penalty.

The following is how you could implement your code using only a 1D array allocated upfront.

size_t nbRows = mData.size();
size_t nbCols = mData[0].size();

auto get_idx = [](const int i, const int nr, const int j)
{
    return i*nr+j;
};

std::vector<std::string> transposedData(nbCols*nbRows);  
for(size_t i = 0 ; i < nbCols  ; ++i)
{
    for(size_t j = 0 ; j < nbRows ; ++j)
    {
        const int idx = get_idx(j, nbCols,i);
        transposedData[idx] = std::move(mData[j][i]);
    }
}

for(size_t i = 0 ; i < nbCols  ; ++i)
{
    for(size_t j = 0 ; j < nbRows ; ++j)
    {
        const int idx = get_idx(j, nbCols,i);
        cout<<transposedData[idx]<<" ";
    }
    cout<<endl;
}    

I would like to stress it again: profile your code. Try out software like valgrind --tool= callgrind or gprof allowing you to profile and visualize performance data about your app.

The program has redundancies on several levels.

The obvious one is that you don't need to transpose the vector in order to transpose the file.

vector<vector<string> originalData;
// read the file to originalData

for(size_t i = 0 ; i < nbCols  ; ++i)
{
    for(size_t j = 0 ; j < nbRows ; ++j)
    {
        cout << originalData[j][i] << " ";
    }
    cout<<endl;
}

Assuming you do need to produce a transposed vector for some reason, one way to write the transpose loop would be

vector<vector<string>> transposedData (nbCols);
for (size_t j = 0; j < nbCols; ++j)
{
    transposedData[j].reserve(nrows);
    for (size_t i = 0; i < nbRows; ++i) 
    {
        transposedData[j].emplace_back(originalData[i][j]);
        // if keeping original veector is not needed ...
        // transposedData[j].emplace_back(std::move(originalData[i][j]));
    }
}

On my (fairly beefy) machine it takes about 6-7 seconds to transpose a 1000x90000 matrix of 3-character strings. This is not particularly impressive, if you don't need to transpose multi-million-element matrices 24 hours a day, it does what you need without too much overhead.

The penalty may come from the fact you're overusing resize on your for loop.

As per the reference :

Complexity

Linear in the difference between the current size and count. Additional complexity possible due to reallocation if capacity is less than count

Memory allocation costs a lot, so you may want to avoid overdoing it.

As pointed by others, upfront allocation would be an interesting approach to avoid recreating (resizing) your vector every time.

There is not enough available memory on my machine to perform this task (see below). Splitting my data in three parts, I solved the task in a few seconds. Here is the output of a code scrutinizing memory:

 free ram 2.5GB IO populating mData with raw data free ram 0.2GB Empty string capacity: 15 bytes Intending to allocate 1.4 GB terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc Aborted

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM