简体   繁体   English

为什么std :: nth_element返回N <33个元素的输入向量的排序向量?

[英]Why does std::nth_element return sorted vectors for input vectors with N < 33 elements?

I am using std::nth_element to get a (roughly correct) value for a percentile of a vector, like so: 我使用std::nth_element来获取向量百分位数的(大致正确)值,如下所示:

double percentile(std::vector<double> &vectorIn, double percent)
{
    std::nth_element(vectorIn.begin(), vectorIn.begin() + (percent*vectorIn.size())/100, vectorIn.end());

    return vectorIn[(percent*vectorIn.size())/100];
}  

I noticed that for vectorIn lengths of up to 32 elements, the vector gets completely sorted. 我注意到,对于vectorIn长度最多为32个元素,向量将完全排序。 Starting from 33 elements it is never sorted (as expected). 从33个元素开始,它永远不会被排序(如预期的那样)。

Not sure whether this matters but the function is in a "(Matlab-)mex c++ code" that is compiled via Matlab using the "Microsoft Windows SDK 7.1 (C++)". 不确定这是否重要,但功能是在“(Matlab-)mex c ++代码”中,通过Matlab使用“Microsoft Windows SDK 7.1(C ++)”编译。

EDIT: 编辑:

Also see the following histogram of the lengths of the longest sorted blocks in 1e5 vectors passed to the function (vectors contained 1e4 random elements and a random percentile was calculated). 还参见传递给函数的1e5向量中最长排序块的长度的以下直方图(包含1e4个随机元素和随机百分位数的向量)。 Note the peak at very small values. 注意非常小的峰值。

长度的直方图排序块

This will vary from standard library implementation to standard library implementation (and may vary based on other factors) but in general terms: 这将从标准库实现到标准库实现(并且可能根据其他因素而有所不同),但一般而言:

  • std::nth_element is allowed to rearrange the input container as it sees fit, provided that the nth_element is in position n, and the container is partitioned at position n. 如果nth_element位于位置n,并且容器在位置n处分区,则允许std :: nth_element在其认为合适时重新排列输入容器。

  • For small containers, it is usually faster to do a full insertion-sort than a quickselect, even though that is not scalable. 对于小容器,执行完全插入排序通常比快速选择更快,即使这不可扩展。

Since standard library authors will usually opt for the fastest solution, most nth_element implementations (and, for that matter, sort implementations) use customized algorithms for small inputs (or for small segments at the bottom of the recursion), which may sort the container more aggressively than seems necessary. 由于标准库作者通常会选择最快的解决方案,因此大多数nth_element实现(以及,就此而言,排序实现)对小输入(或递归底部的小段)使用自定义算法,这可能会对容器进行更多排序积极而不是看似必要。 For vectors of scalar values, insertion sort is extremely fast, since it takes maximum advantage of the cache. 对于标量值的向量,插入排序非常快,因为它最大限度地利用了缓存。 With streaming extensions, it is possible to speed it up even more by doing parallel compares. 通过流式扩展,可以通过并行比较来加快速度。

By the way, you can save a tiny amount of calculation by only computing the threshold iterator once, which might be more readable: 顺便说一句,只需计算一次阈值迭代器就可以节省少量计算,这可能更具可读性:

double percentile(std::vector<double> &vectorIn, double percent)
{
    auto nth = vectorIn.begin() + (percent*vectorIn.size())/100;
    std::nth_element(vectorIn.begin(), nth, vectorIn.end());
    return *nth;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM