[英]Median of multiple vectors of double (c++, vector < vector<double> >)
I have a datastructure containing a vector of vectors which each consist of about ~16000000 double values. 我有一个包含向量向量的数据结构,每个向量都包含约16000000个双精度值。
I now want to median-combine these vectors, meaning, of each original vectors I take the values at place i, calculate the median of these and then store them in the resulting vector at place i. 现在,我想对这些向量进行中值合并,也就是说,将每个原始向量的值取到i处的值,计算它们的中值,然后将它们存储在i处的结果向量中。
I already have the straight-forward solution, but it is incredible slow: 我已经有了简单明了的解决方案,但是速度却非常慢:
vector< vector<double> > vectors; //vectors contains the datavectors
vector<double> tmp;
vector<double> result;
vector<double> tmpmedian;
double pixels = 0.0;
double matrixcount = vectors.size();
tmp = vectors.at(0);
pixels = tmp.size();
for (int i = 0; i < pixels; i++) {
for (int j = 0; j < matrixcount; j++) {
tmp = vectors.at(j);
tmpmedian.push_back(tmp.at(i));
}
result.push_back(medianOfVector(tmpmedian));
tmpmedian.clear();
}
return result;
And medianOfVector looks like this: 中位数向量如下所示:
double result = 0;
if ((vec.size() % 2) != 0) {
vector<double>::iterator i = vec.begin();
vector<double>::size_type m = (vec.size() / 2);
nth_element(i, i + m, vec.end());
result = vec.at(m);
} else {
vector<double>::iterator i = vec.begin();
vector<double>::size_type m = (vec.size() / 2) - 1;
nth_element(i, i + m, vec.end());
result = (vec.at(m) + vec.at(m + 1)) / 2;
}
return result;
I there an algorithm or a way to do this faster, it takes nearly an eternity to do it. 我有一种算法或方法可以更快地完成此任务,几乎花了一个永恒的时间。
Edit: Thank you for your replies, in case anyone is interested here is the fixed version, it now takes about 9sec to median combine three vectors with ~16000000 elements, mean combining takes around 3sec: 编辑:感谢您的答复,以防万一有人对这里感兴趣的是固定版本,现在需要大约9秒才能对三个向量和16,000,000个元素进行中值合并,平均合并大约需要3秒:
vector< vector<double> > vectors; //vectors contains the datavectors
vector<double> *tmp;
vector<double> result;
vector<double> tmpmedian;
tmp = &vectors.at(0);
int size = tmp->size();
int vectorsize = vectors.size();
for (int i = 0; i < size; i++) {
for (int j = 0; j < vectorsize; j++) {
tmp = &vectors.at(j);
tmpmedian.push_back(tmp->at(i));
}
result.push_back(medianOfVector(tmpmedian));
tmpmedian.clear();
}
return result;
And medianOfVector: 和midOfOfVector:
double result = 0;
if ((vec.size() % 2) != 0) {
vector<double>::iterator i = vec.begin();
vector<double>::size_type m = (vec.size() / 2);
nth_element(i, i + m, vec.end());
result = vec.at(m);
} else {
vector<double>::iterator i = vec.begin();
vector<double>::size_type m = (int) (((vec.size() - 1) / 2));
nth_element(i, i + m, vec.end());
double min = vec.at(m);
double max = *min_element(i + m + 1, vec.end());
result = (min + max) / 2;
}
return result;
}
A couple of points, both stemming from the fact that you've defined tmp
as a vector instead of (for example) a reference. 有两点,都是基于您已经将
tmp
定义为向量而不是(例如)引用的事实。
vector<double> tmp;
tmp = vectors.at(0);
pixels = tmp.size();
Here you're copying the entirety of vectors[0]
into tmp
just to extract the size. 在这里,您将整个
vectors[0]
复制到tmp
只是为了提取大小。 You'll almost certainly gain some speed by avoiding the copy: 通过避免复制,您几乎可以肯定会提高速度:
pixels = vectors.at(0).size();
Instead of copying the entire vector just to get its size, this just gets a reference to the first vector, and gets the size of that existing vector. 与其复制整个矢量来获取其大小,还不如复制第一个矢量的引用 ,并获取该现有矢量的大小。
for (int i = 0; i < pixels; i++) {
for (int j = 0; j < matrixcount; j++) {
tmp = vectors.at(j);
tmpmedian.push_back(tmp.at(i));
}
Here you're again copying the entirety of vectors.at(j)
into tmp
. 在这里,您再次将整个
vectors.at(j)
复制到tmp
。 But (again) you don't really need a new copy of all the data--you're just retrieving a single item from that copy. 但是(再次)您实际上并不需要所有数据的新副本-您只是从该副本中检索单个项目。 You can retrieve the data you need directly from the original vector without copying the whole thing:
您可以直接从原始向量中检索所需的数据,而无需复制整个内容:
tmpmedian.push_back(vectors.at(j).at(i));
A possible next step would be to switch from using .at
to operator[]
: 下一步可能是从使用
.at
切换到operator[]
:
tmpmedian.push_back(vectors[j][i]);
This is much more of a tradeoff though--it's not likely to gain nearly as much, and loses a bit of safety (range checking) in the process. 但是,这更多的是权衡取舍-它获得的收益不太可能几乎相同,并且在此过程中会失去一些安全性(范围检查)。 To avoid losing safety, you could consider (for example) using range-based
for
loops instead of the counted for
loops in your current code. 为了避免失去安全性,您可以考虑(例如)使用基于范围的
for
循环,而不是在当前代码中使用计数的for
循环。
Along rather different lines, you could instead change from using a vector<vector<double>>
to using a small wrapper around a vector to give 2D addressing into a single vector. 沿着完全不同的路线,您可以改为从使用
vector<vector<double>>
到在矢量周围使用小型包装器将2D寻址提供给单个矢量。 Using this with a suitable column-wise iterator, you could avoid creating tmpmedian
as basically a copy of a column of the original 2D matrix--instead, you'd pass a column-wise iterator to medianOfVector
, and just iterate through a column of the original data in-place. 将其与合适的按列迭代器结合使用,可以避免将
tmpmedian
创建为原始2D矩阵的一列副本,而是medianOfVector
列迭代器传递给medianOfVector
,而仅对原位数据。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.