简体   繁体   中英

Median of multiple vectors of double (c++, vector < vector<double> >)

I have a datastructure containing a vector of vectors which each consist of about ~16000000 double values.

I now want to median-combine these vectors, meaning, of each original vectors I take the values at place i, calculate the median of these and then store them in the resulting vector at place i.

I already have the straight-forward solution, but it is incredible slow:

vector< vector<double> > vectors; //vectors contains the datavectors
vector<double> tmp;
vector<double> result;
vector<double> tmpmedian;
double pixels = 0.0;
double matrixcount = vectors.size();

    tmp = vectors.at(0);
    pixels = tmp.size();
    for (int i = 0; i < pixels; i++) {
        for (int j = 0; j < matrixcount; j++) {
            tmp = vectors.at(j);
            tmpmedian.push_back(tmp.at(i));
        }
        result.push_back(medianOfVector(tmpmedian));
        tmpmedian.clear();
    }

return result;

And medianOfVector looks like this:

double result = 0;

if ((vec.size() % 2) != 0) {
    vector<double>::iterator i = vec.begin();
    vector<double>::size_type m = (vec.size() / 2);

    nth_element(i, i + m, vec.end());
    result = vec.at(m);
} else {
    vector<double>::iterator i = vec.begin();
    vector<double>::size_type m = (vec.size() / 2) - 1;

    nth_element(i, i + m, vec.end());
    result = (vec.at(m) + vec.at(m + 1)) / 2;
}

return result;

I there an algorithm or a way to do this faster, it takes nearly an eternity to do it.


Edit: Thank you for your replies, in case anyone is interested here is the fixed version, it now takes about 9sec to median combine three vectors with ~16000000 elements, mean combining takes around 3sec:

vector< vector<double> > vectors; //vectors contains the datavectors
vector<double> *tmp;
vector<double> result;
vector<double> tmpmedian;

    tmp = &vectors.at(0);
    int size = tmp->size();
    int vectorsize = vectors.size();
    for (int i = 0; i < size; i++) {
        for (int j = 0; j < vectorsize; j++) {
            tmp = &vectors.at(j);
            tmpmedian.push_back(tmp->at(i));
        }
        result.push_back(medianOfVector(tmpmedian));
        tmpmedian.clear();
    }
return result;

And medianOfVector:

double result = 0;

if ((vec.size() % 2) != 0) {
    vector<double>::iterator i = vec.begin();
    vector<double>::size_type m = (vec.size() / 2);

    nth_element(i, i + m, vec.end());
    result = vec.at(m);
} else {
    vector<double>::iterator i = vec.begin();
    vector<double>::size_type m = (int) (((vec.size() - 1) / 2));
    nth_element(i, i + m, vec.end());
    double min = vec.at(m);
    double max = *min_element(i + m + 1, vec.end());
    result = (min + max) / 2;
}

return result;
}

A couple of points, both stemming from the fact that you've defined tmp as a vector instead of (for example) a reference.

vector<double> tmp;

tmp = vectors.at(0);
pixels = tmp.size();

Here you're copying the entirety of vectors[0] into tmp just to extract the size. You'll almost certainly gain some speed by avoiding the copy:

pixels = vectors.at(0).size();

Instead of copying the entire vector just to get its size, this just gets a reference to the first vector, and gets the size of that existing vector.

for (int i = 0; i < pixels; i++) {
    for (int j = 0; j < matrixcount; j++) {
        tmp = vectors.at(j);
        tmpmedian.push_back(tmp.at(i));
    }

Here you're again copying the entirety of vectors.at(j) into tmp . But (again) you don't really need a new copy of all the data--you're just retrieving a single item from that copy. You can retrieve the data you need directly from the original vector without copying the whole thing:

tmpmedian.push_back(vectors.at(j).at(i));

A possible next step would be to switch from using .at to operator[] :

tmpmedian.push_back(vectors[j][i]);

This is much more of a tradeoff though--it's not likely to gain nearly as much, and loses a bit of safety (range checking) in the process. To avoid losing safety, you could consider (for example) using range-based for loops instead of the counted for loops in your current code.

Along rather different lines, you could instead change from using a vector<vector<double>> to using a small wrapper around a vector to give 2D addressing into a single vector. Using this with a suitable column-wise iterator, you could avoid creating tmpmedian as basically a copy of a column of the original 2D matrix--instead, you'd pass a column-wise iterator to medianOfVector , and just iterate through a column of the original data in-place.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM