简体   繁体   English

vector 与 unordered_set 的奇怪性能行为

[英]Strange performance behavior with vector vs unordered_set

I am evaluating frequencies using sample data in the snipped file below.我正在使用下面截取的文件中的样本数据评估频率。

I have noticed that with an unordered list, the evaluation takes less than a second to return a result.我注意到对于无序列表,评估只需不到一秒钟的时间即可返回结果。 However, with a vector, it takes almost a whole minute to evaluate it!然而,对于向量,评估它几乎需要一整分钟!

There are several factors I considered:我考虑了几个因素:

  • Size of the data数据大小
  • The data itself数据本身

After several experiments, I found that if I took out the 2nd to last data (-6) the performance is almost identical and results are returned for both in less than a second!经过几次实验,我发现如果我取出倒数第二个数据(-6),性能几乎相同,并且在不到一秒的时间内返回两者的结果!


However, if I include the -6, the vector evaluation takes too long!但是,如果我包含 -6,则矢量计算时间太长!


I tried changing the number like -5, -4, etc. and the performance was actually pretty good!我尝试更改数字,如 -5、-4 等,性能实际上非常好!


For some reason, only -6 before the last data/number (+125503) in the file seems to be affecting the vector performance...what's going on?出于某种原因,文件中最后一个数据/数字 (+125503) 之前只有 -6 似乎影响了矢量性能……这是怎么回事?

Note: of course, I tried running them individually too by commenting out the unorderedlist logic and then the vector logic, same behavior注意:当然,我也尝试通过注释掉无序列表逻辑和矢量逻辑来单独运行它们,同样的行为


#include <algorithm>
#include <unordered_set>
#include <fstream>
#include <iostream>
#include <vector>

using namespace std;

vector<int> scanFile(ifstream &file) {
    vector<int> scannedFile;
    string str;
    while (getline(file, str)) {
    return scannedFile;

int main() {
    ifstream inputFile;
    vector<int> fileInfo;
    string str = "";
    fileInfo = scanFile(inputFile);
    int Occurrences = 0;
    unordered_set<int> unordrdList; //results are immediate, even with -6!
    bool found = false;
    while (!found) {
        for (int n : fileInfo) {
          Occurrences += n;
          found = unordrdList.find(Occurrences) != unordrdList.end();
          if (found) {
            cout << "Using Unordered_Set: The 2nd showing #: " << to_string(Occurrences) << endl;
    int Occurrnce = 0;
    vector<int> vectr; //result takes too long with -6 present in the file before 2nd to last line!
    bool found2 = false;
    while (!found2) {
        for (int n : fileInfo) {
          Occurrnce += n;
          found2 = find(vectr.begin(), vectr.end(), Occurrnce) != vectr.end();
          if (found2) {
            cout << "Using Vector: The 2nd showing #: " << to_string(Occurrnce) << endl;

text file:文本文件:


find in unordered set is on average O(1) whereas find over vector is O(n).在无序集中查找平均为 O(1),而在向量上查找为 O(n)。 Searching in vector is going to take longer.在向量中搜索将花费更长的时间。

Not entirely sure that this is the cause, but the find method of List<> on some platforms is implemented to use a different algorithm with few entries than with larger number of entries (normally usually with a speed or mem usage benefit).不完全确定这是原因,但某些平台上的List<>的 find 方法被实现为使用不同的算法,其中条目很少而不是条目数量较多(通常通常具有速度或内存使用优势)。 So this may explain the jump in performance after a certain entry.因此,这可以解释在某个条目之后性能的跃升。

However if find() is your main application and you do not have the need for non-unique entries, a SET is simply the better choice because as @Rama mentioned, it has a O(1) complexity.但是,如果find()是您的主要应用程序并且您不需要非唯一条目,那么 SET 只是更好的选择,因为正如@Rama 提到的那样,它具有 O(1) 复杂性。 The reason is, it likely uses a hash system which also drastically speeds up the check for uniqueness on every insert (which would effectively be a find() call otherwise).原因是,它可能使用 hash 系统,该系统还大大加快了对每个插入的唯一性检查(否则实际上是find()调用)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM