简体   繁体   English

使用 STL 容器进行中值计算时,正确的方法是什么?

[英]What is the right approach when using STL container for median calculation?

Let's say I need to retrieve the median from a sequence of 1000000 random numeric values.假设我需要从 1000000 个随机数值序列中检索中位数。

If using anything but std::list , I have no (built-in) way to sort sequence for median calculation.如果使用任何东西,但std::list ,我没有(内置)的方式,为中值计算排序顺序。

If using std::list , I can't randomly access values to retrieve middle (median) of sorted sequence.如果使用std::list ,我不能随机访问值来检索排序序列的中间(中值)。

Is it better to implement sorting myself and go with eg std::vector , or is it better to use std::list and use std::list::iterator to for-loop-walk to the median value?自己实现排序并使用例如std::vector更好,还是使用std::list并使用std::list::iterator for-loop-walk 到中值更好? The latter seems less overheadish, but also feels more ugly..后者看起来不那么开销,但也感觉更丑陋..

Or are there more and better alternatives for me?或者对我来说有更多更好的选择吗?

Any random-access container (like std::vector ) can be sorted with the standard std::sort algorithm, available in the <algorithm> header.任何随机访问容器(如std::vector )都可以使用标准std::sort算法进行std::sort ,在<algorithm>标头中可用。

For finding the median, it would be quicker to use std::nth_element ;为了找到中位数,使用std::nth_element会更快; this does enough of a sort to put one chosen element in the correct position, but doesn't completely sort the container.这足以将一个选定的元素放在正确的位置,但并没有完全对容器进行排序。 So you could find the median like this:所以你可以找到这样的中位数:

int median(vector<int> &v)
{
    size_t n = v.size() / 2;
    nth_element(v.begin(), v.begin()+n, v.end());
    return v[n];
}

The median is more complex than Mike Seymour's answer.中位数比 Mike Seymour 的答案更复杂。 The median differs depending on whether there are an even or an odd number of items in the sample.中位数的不同取决于样本中的项目数是偶数还是奇数。 If there are an even number of items, the median is the average of the middle two items.如果有偶数个项目,中位数是中间两个项目的平均值。 This means that the median of a list of integers can be a fraction.这意味着整数列表的中位数可以是分数。 Finally, the median of an empty list is undefined.最后,空列表的中位数是不确定的。 Here is code that passes my basic test cases:这是通过我的基本测试用例的代码:

///Represents the exception for taking the median of an empty list
class median_of_empty_list_exception:public std::exception{
  virtual const char* what() const throw() {
    return "Attempt to take the median of an empty list of numbers.  "
      "The median of an empty list is undefined.";
  }
};

///Return the median of a sequence of numbers defined by the random
///access iterators begin and end.  The sequence must not be empty
///(median is undefined for an empty set).
///
///The numbers must be convertible to double.
template<class RandAccessIter>
double median(RandAccessIter begin, RandAccessIter end) 
  throw(median_of_empty_list_exception){
  if(begin == end){ throw median_of_empty_list_exception(); }
  std::size_t size = end - begin;
  std::size_t middleIdx = size/2;
  RandAccessIter target = begin + middleIdx;
  std::nth_element(begin, target, end);

  if(size % 2 != 0){ //Odd number of elements
    return *target;
  }else{            //Even number of elements
    double a = *target;
    RandAccessIter targetNeighbor= target-1;
    std::nth_element(begin, targetNeighbor, end);
    return (a+*targetNeighbor)/2.0;
  }
}

This algorithm handles both even and odd sized inputs efficiently using the STL nth_element (amortized O(N)) algorithm and the max_element algorithm (O(n)).该算法使用 STL nth_element(摊销 O(N))算法和 max_element 算法 (O(n))有效地处理偶数和奇数大小的输入。 Note that nth_element has another guaranteed side effect, namely that all of the elements before n are all guaranteed to be less than v[n] , just not necessarily sorted.请注意, nth_element 有另一个保证的副作用,即n之前的所有元素都保证小于v[n] ,只是不一定要排序。

//post-condition: After returning, the elements in v may be reordered and the resulting order is implementation defined.
double median(vector<double> &v)
{
  if(v.empty()) {
    return 0.0;
  }
  auto n = v.size() / 2;
  nth_element(v.begin(), v.begin()+n, v.end());
  auto med = v[n];
  if(!(v.size() & 1)) { //If the set size is even
    auto max_it = max_element(v.begin(), v.begin()+n);
    med = (*max_it + med) / 2.0;
  }
  return med;    
}

Here's a more complete version of Mike Seymour's answer:这是 Mike Seymour 答案的更完整版本:

// Could use pass by copy to avoid changing vector
double median(std::vector<int> &v)
{
  size_t n = v.size() / 2;
  std::nth_element(v.begin(), v.begin()+n, v.end());
  int vn = v[n];
  if(v.size()%2 == 1)
  {
    return vn;
  }else
  {
    std::nth_element(v.begin(), v.begin()+n-1, v.end());
    return 0.5*(vn+v[n-1]);
  }
}

It handles odd- or even-length input.它处理奇数或偶数长度的输入。

putting together all the insights from this thread I ended up having this routine.把这个线程的所有见解放在一起,我最终有了这个例程。 it works with any stl-container or any class providing input iterators and handles odd- and even-sized containers.它适用于任何 stl 容器或任何提供输入迭代器的类,并处理奇数和偶数大小的容器。 It also does its work on a copy of the container, to not modify the original content.它还对容器的副本进行处理,而不是修改原始内容。

template <typename T = double, typename C>
inline const T median(const C &the_container)
{
    std::vector<T> tmp_array(std::begin(the_container), 
                             std::end(the_container));
    size_t n = tmp_array.size() / 2;
    std::nth_element(tmp_array.begin(), tmp_array.begin() + n, tmp_array.end());

    if(tmp_array.size() % 2){ return tmp_array[n]; }
    else
    {
        // even sized vector -> average the two middle values
        auto max_it = std::max_element(tmp_array.begin(), tmp_array.begin() + n);
        return (*max_it + tmp_array[n]) / 2.0;
    }
}

You can sort an std::vector using the library function std::sort .您可以使用库函数std::sort std::vector进行std::sort

std::vector<int> vec;
// ... fill vector with stuff
std::sort(vec.begin(), vec.end());

There exists a linear-time selection algorithm .存在线性时间选择算法 The below code only works when the container has a random-access iterator, but it can be modified to work without — you'll just have to be a bit more careful to avoid shortcuts like end - begin and iter + n .下面的代码仅在容器具有随机访问迭代器时才有效,但可以对其进行修改以使其工作 - 您只需要更加小心,避免使用end - beginiter + n等快捷方式。

#include <algorithm>
#include <cstdlib>
#include <iostream>
#include <sstream>
#include <vector>

template<class A, class C = std::less<typename A::value_type> >
class LinearTimeSelect {
public:
    LinearTimeSelect(const A &things) : things(things) {}
    typename A::value_type nth(int n) {
        return nth(n, things.begin(), things.end());
    }
private:
    static typename A::value_type nth(int n,
            typename A::iterator begin, typename A::iterator end) {
        int size = end - begin;
        if (size <= 5) {
            std::sort(begin, end, C());
            return begin[n];
        }
        typename A::iterator walk(begin), skip(begin);
#ifdef RANDOM // randomized algorithm, average linear-time
        typename A::value_type pivot = begin[std::rand() % size];
#else // guaranteed linear-time, but usually slower in practice
        while (end - skip >= 5) {
            std::sort(skip, skip + 5);
            std::iter_swap(walk++, skip + 2);
            skip += 5;
        }
        while (skip != end) std::iter_swap(walk++, skip++);
        typename A::value_type pivot = nth((walk - begin) / 2, begin, walk);
#endif
        for (walk = skip = begin, size = 0; skip != end; ++skip)
            if (C()(*skip, pivot)) std::iter_swap(walk++, skip), ++size;
        if (size <= n) return nth(n - size, walk, end);
        else return nth(n, begin, walk);
    }
    A things;
};

int main(int argc, char **argv) {
    std::vector<int> seq;
    {
        int i = 32;
        std::istringstream(argc > 1 ? argv[1] : "") >> i;
        while (i--) seq.push_back(i);
    }
    std::random_shuffle(seq.begin(), seq.end());
    std::cout << "unordered: ";
    for (std::vector<int>::iterator i = seq.begin(); i != seq.end(); ++i)
        std::cout << *i << " ";
    LinearTimeSelect<std::vector<int> > alg(seq);
    std::cout << std::endl << "linear-time medians: "
        << alg.nth((seq.size()-1) / 2) << ", " << alg.nth(seq.size() / 2);
    std::sort(seq.begin(), seq.end());
    std::cout << std::endl << "medians by sorting: "
        << seq[(seq.size()-1) / 2] << ", " << seq[seq.size() / 2] << std::endl;
    return 0;
}

Here is an answer that considers the suggestion by @MatthieuM.这是一个考虑了@MatthieuM 建议的答案。 ie does not modify the input vector .不修改输入向量 It uses a single partial sort (on a vector of indices) for both ranges of even and odd cardinality, while empty ranges are handled with exceptions thrown by a vector's at method:它对偶数和奇数基数范围使用单个部分排序(在索引向量上),而空范围使用向量的at方法抛出的异常进行处理:

double median(vector<int> const& v)
{
    bool isEven = !(v.size() % 2); 
    size_t n    = v.size() / 2;

    vector<size_t> vi(v.size()); 
    iota(vi.begin(), vi.end(), 0); 

    partial_sort(begin(vi), vi.begin() + n + 1, end(vi), 
        [&](size_t lhs, size_t rhs) { return v[lhs] < v[rhs]; }); 

    return isEven ? 0.5 * (v[vi.at(n-1)] + v[vi.at(n)]) : v[vi.at(n)];
}

Demo演示

Armadillo has an implementation that looks like the one in the answer https://stackoverflow.com/a/34077478 by https://stackoverflow.com/users/2608582/matthew-fioravante Armadillo有一个实现,看起来像https://stackoverflow.com/a/34077478的答案https://stackoverflow.com/users/2608582/matthew-fioravante

It uses one call to nth_element and one call to max_element and it is here: https://gitlab.com/conradsnicta/armadillo-code/-/blob/9.900.x/include/armadillo_bits/op_median_meat.hpp#L380它使用对nth_element一次调用和对max_element一次调用,它在这里: https : max_element

//! find the median value of a std::vector (contents is modified)
template<typename eT>
inline 
eT
op_median::direct_median(std::vector<eT>& X)
  {
  arma_extra_debug_sigprint();
  
  const uword n_elem = uword(X.size());
  const uword half   = n_elem/2;
  
  typename std::vector<eT>::iterator first    = X.begin();
  typename std::vector<eT>::iterator nth      = first + half;
  typename std::vector<eT>::iterator pastlast = X.end();
  
  std::nth_element(first, nth, pastlast);
  
  if((n_elem % 2) == 0)  // even number of elements
    {
    typename std::vector<eT>::iterator start   = X.begin();
    typename std::vector<eT>::iterator pastend = start + half;
    
    const eT val1 = (*nth);
    const eT val2 = (*(std::max_element(start, pastend)));
    
    return op_mean::robust_mean(val1, val2);
    }
  else  // odd number of elements
    {
    return (*nth);
    }
  }
you can use this approch. It also takes care of sliding window.
Here days are no of trailing elements for which we want to find median and this makes sure the original container is not changed


#include<bits/stdc++.h>

using namespace std;

int findMedian(vector<int> arr, vector<int> brr, int d, int i)
{
    int x,y;
    x= i-d;
    y=d;
    brr.assign(arr.begin()+x, arr.begin()+x+y);


    sort(brr.begin(), brr.end());

    if(d%2==0)
    {
        return((brr[d/2]+brr[d/2 -1]));
    }

    else
    {
        return (2*brr[d/2]);
    }

    // for (int i = 0; i < brr.size(); ++i)
    // {
    //     cout<<brr[i]<<" ";
    // }

    return 0;

}

int main()
{
    int n;
    int days;
    int input;
    int median;
    int count=0;

    cin>>n>>days;

    vector<int> arr;
    vector<int> brr;

    for (int i = 0; i < n; ++i)
    {
        cin>>input;
        arr.push_back(input);
    }

    for (int i = days; i < n; ++i)
    {
        median=findMedian(arr,brr, days, i);

        
    }



    return 0;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM