获取大于一个数字的元素数量

Question

I am trying to solve the following problem: Numbers are being inserted into a container.我正在尝试解决以下问题：正在将数字插入到容器中。 Each time a number is inserted I need to know how many elements are in the container that are greater than or equal to the current number being inserted.每次插入一个数字时，我都需要知道容器中有多少元素大于或等于当前插入的数字。 I believe both operations can be done in logarithmic complexity.我相信这两种操作都可以在对数复杂度中完成。

My question: Are there standard containers in a C++ library that can solve the problem?我的问题： C++ 库中是否有可以解决问题的标准容器？ I know that std::multiset can insert elements in logarithmic time, but how can you query it?我知道std::multiset可以在对数时间内插入元素，但是如何查询呢？ Or should I implement a data structure (ex a binary search tree) to solve it?或者我应该实现一个数据结构（例如二叉搜索树）来解决它？

Answer 1

Great question.很好的问题。 I do not think there is anything in STL which would suit your needs (provided you MUST have logarithmic times).我认为 STL 中没有任何东西可以满足您的需求（前提是您必须有对数时间）。 I think the best solution then, as aschepler says in comments, is to implement a RB tree.正如 aschepler 在评论中所说，我认为最好的解决方案是实现 RB 树。 You may have a look at STL source code, particularly on stl_tree.h to see whether you could use bits of it.您可以查看 STL 源代码，特别是在stl_tree.h ，看看您是否可以使用它的一部分。

Better still, look at : ( Rank Tree in C++ )更好的是，看看：（ C++ 中的排名树）

Which contains link to implementation:其中包含实现的链接：

( http://code.google.com/p/options/downloads/list ) （ http://code.google.com/p/options/downloads/list ）

Answer 2

You should use a multiset for logarithmic complexity, yes.您应该使用多重集来计算对数复杂度，是的。 But computing the distance is the problem, as set/map iterators are Bidirectional, not RandomAccess, std::distance has an O(n) complexity on them:但是计算距离是问题所在，因为 set/map 迭代器是双向的，而不是 RandomAccess，std::distance 的复杂度为 O(n)：

multiset<int> my_set;
...
auto it = my_map.lower_bound(3);
size_t count_inserted = distance(it, my_set.end()) // this is definitely O(n)
my_map.insert(make_pair(3);

Your complexity-issue is complicated.您的复杂性问题很复杂。 Here is a full analysis:这是一个完整的分析：

If you want a O(log(n)) complexity for each insertion, you need a sorted structure as a set.如果你想要每次插入的复杂度为 O(log(n))，你需要一个排序的结构作为一个集合。 If you want the structure to not reallocate or move items when adding a new item, the insertion point distance computation will be O(n).如果您希望结构在添加新项目时不重新分配或移动项目，插入点距离计算将为 O(n)。 If know the insertion size in advance, you do not need logarithmic insertion time in a sorted container.如果预先知道插入大小，则在已排序的容器中不需要对数插入时间。 You can insert all the items then sort, it is as much O(n.log(n)) as n * O(log(n)) insertions in a set.您可以将所有的物品然后进行排序，这是尽可能多的O（n.log（N））为N * O（日志（n））的一组插入。 The only alternative is to use a dedicated container like a weighted RB-tree.唯一的选择是使用专用容器，如加权 RB 树。 Depending on your problem this may be the solution, or something really overkill.根据您的问题，这可能是解决方案，或东西真的矫枉过正。

Use multiset and distance , you are O(n.log(n)) on insertion (yes, n insertions * log(n) insertion time for each one of them), O(nn) on distance computation, but computing distances is very fast.使用multiset和distance ，你是 O(n.log(n)) 插入（是的，n 插入 * log(n) 插入时间为每个），O(nn) 在距离计算上，但计算距离非常快。
If you know the inserted data size (n) in advance : Use a vector, fill it, sort it, return your distances, you are O(n.log(n)), and it is easy to code.如果你事先知道插入的数据大小（n）：使用向量，填充它，对其进行排序，返回你的距离，你是O（n.log（n）），并且很容易编码。
If you do not know n in advance, your n is likely huge, each item is memory-heavy so you can not have O(n.log(n)) reallocation : then you have time to re-encode or re-use some non-standard code, you really have to meet these complexity expectations, use a dedicated container.如果您事先不知道 n，您的 n 可能很大，每个项目都占用大量内存，因此您不能进行 O(n.log(n)) 重新分配：那么您有时间重新编码或重新使用一些非标准代码，你真的必须满足这些复杂性期望，使用专用容器。 Also consider using a database, you will probably have issues maintaining this in memory.还要考虑使用数据库，您可能会在内存中维护它。

Answer 3

Here's a quick way using Policy-Based Data Structures in C++:这是在 C++ 中使用基于策略的数据结构的快速方法：

There exists something called as an Ordered Set, which lets you insert/remove elements in O(logN) time (and pretty much all other functions that std::set has to offer).存在称为有序集的东西，它允许您在 O(logN) 时间内插入/删除元素（以及 std::set 必须提供的几乎所有其他功能）。 It also gives 2 more features: Find the Kth element and **find the rank of the Xth element.它还提供了另外 2 个功能：查找第 K 个元素和**查找第 X 个元素的等级。 The problem is that this doesn't allow duplicates :(问题是这不允许重复:(

No Worries though!不过不用担心！ We will map duplicates with a separate index/priority, and define a new structure (call it Ordered Multiset)!我们将使用单独的索引/优先级映射重复项，并定义一个新结构（称为有序多集）！ I've attached my implementation below for reference.我在下面附上了我的实现以供参考。

Finally, every time you want to find the no of elements greater than say x, call the function upper_bound (No of elements less than or equal to x) and subtract this number from the size of your Ordered Multiset!最后，每次你想找到大于 x 的元素数时，调用函数 upper_bound（小于或等于 x 的元素数）并从有序多重集的大小中减去这个数字！

Note: PBDS use a lot of memory, so that is a constraint, I'd suggest using a Binary Search Tree or a Fenwick Tree.注意：PBDS 使用大量内存，所以这是一个限制，我建议使用二叉搜索树或 Fenwick 树。

#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
using namespace std;
using namespace __gnu_pbds;

struct ordered_multiset { // multiset supporting duplicating values in set
    int len = 0;
    const int ADD = 1000010;
    const int MAXVAL = 1000000010;
    unordered_map<int, int> mp; // hash = 96814
    tree<int, null_type, less<int>, rb_tree_tag, tree_order_statistics_node_update> T;

    ordered_multiset() { len = 0; T.clear(), mp.clear(); }

    inline void insert(int x){
        len++, x += MAXVAL;
        int c = mp[x]++;
        T.insert((x * ADD) + c); }

    inline void erase(int x){
        x += MAXVAL;
        int c = mp[x];
        if(c) {
            c--, mp[x]--, len--;
            T.erase((x*ADD) + c); } }

    inline int kth(int k){        // 1-based index,  returns the
        if(k<1 || k>len) return -1;     // K'th element in the treap,
        auto it = T.find_by_order(--k); // -1 if none exists
        return ((*it)/ADD) - MAXVAL; } 

    inline int lower_bound(int x){      // Count of value <x in treap
        x += MAXVAL;
        int c = mp[x];
        return (T.order_of_key((x*ADD)+c)); }

    inline int upper_bound(int x){      // Count of value <=x in treap
        x += MAXVAL;
        int c = mp[x];
        return (T.order_of_key((x*ADD)+c)); }

    inline int size() { return len; }   // Number of elements in treap
};

Usage:用法：

    ordered_multiset s;
    for(int i=0; i<n; i++) {
        int x; cin>>x;
        s.insert(x);
        int ctr = s.size() - s.upper_bound(x);
        cout<<ctr<<" ";
    }

Input (n = 6) : 10 1 3 3 2输入 (n = 6) : 10 1 3 3 2
Output : 0 1 1 1 3输出： 0 1 1 1 3

Time Complexity : O(log n) per query/insert时间复杂度：每个查询/插入 O(log n)

References : mochow13's GitHub参考资料： mochow13 的 GitHub

Answer 4

Sounds like a case for count_if - although I admit this doesn't solve it at logarithmic complexity, that would require a sorted type.听起来像是count_if一个例子——虽然我承认这不能以对数复杂度解决它，但这需要一个排序类型。

vector<int> v = { 1, 2, 3, 4, 5 };
int some_value = 3;

int count = count_if(v.begin(), v.end(), [some_value](int n) { return n > some_value; } );

Edit done to fix syntactic problems with lambda function已完成编辑以修复 lambda 函数的语法问题

Answer 5

If the whole range of numbers is sufficiently small (on the order of a few million), this problem can be solved relatively easily using a Fenwick tree .如果整个数字范围足够小（大约几百万），则可以使用Fenwick 树相对容易地解决此问题。

Although Fenwick trees are not part of the STL, they are both very easy to implement and time efficient.尽管Fenwick 树不是 STL 的一部分，但它们都非常容易实现且省时。 The time complexity is O(log N) for both updates and queries and the constant factors are low.更新和查询的时间复杂度都是O(log N) ，并且常数因子很低。

You mention in a comment on another question , that you needed this for a contest.你在另一个问题的评论中提到，你需要这个来参加比赛。 Fenwick trees are very popular tools in competitive programming and are often useful. Fenwick 树是竞争性编程中非常流行的工具，通常很有用。

获取大于一个数字的元素数量

问题描述

5 个解决方案

解决方案1
4 已采纳 2013-07-02 15:39:49

解决方案2
1 2013-07-02 16:09:30

解决方案3
1 2020-11-18 02:21:57

Here's a quick way using Policy-Based Data Structures in C++:这是在 C++ 中使用基于策略的数据结构的快速方法：

Usage:用法：

Time Complexity : O(log n) per query/insert时间复杂度：每个查询/插入 O(log n)

解决方案4
0 2013-07-02 15:01:01

解决方案5
0 2016-04-15 21:40:35

获取大于一个数字的元素数量

问题描述

5 个解决方案

解决方案1 4 已采纳 2013-07-02 15:39:49

解决方案2 1 2013-07-02 16:09:30

解决方案3 1 2020-11-18 02:21:57

Here's a quick way using Policy-Based Data Structures in C++:这是在 C++ 中使用基于策略的数据结构的快速方法：

Usage:用法：

Time Complexity : O(log n) per query/insert时间复杂度：每个查询/插入 O(log n)

解决方案4 0 2013-07-02 15:01:01

解决方案5 0 2016-04-15 21:40:35

解决方案1
4 已采纳 2013-07-02 15:39:49

解决方案2
1 2013-07-02 16:09:30

解决方案3
1 2020-11-18 02:21:57

解决方案4
0 2013-07-02 15:01:01

解决方案5
0 2016-04-15 21:40:35