简体   繁体   English

std :: vector比std :: map更快的键查找?

[英]std::vector is faster than std::map for a key lookup?

I've been using std::vector mostly and was wondering if I should use std::map for a key lookup to improve performance. 我一直在使用std :: vector,并且想知道是否应该使用std :: map进行关键查找以提高性能。

And here's my full test code. 这是我完整的测试代码。

#include <iostream>
#include <string>
#include <map>
#include <vector>
#include <ctime>
#include <chrono>

using namespace std;

vector<string> myStrings = {"aaa", "bbb", "ccc", "ddd", "eee", "fff", "ggg", "hhh", "iii", "jjj", "kkk", "lll", "mmm", "nnn", "ooo", "ppp", "qqq", "rrr", "sss", "ttt", "uuu", "vvv", "www", "xxx", "yyy", "zzz"};

struct MyData {

    string key;
    int value;
};

int findStringPosFromVec(const vector<MyData> &myVec, const string &str) {

    auto it = std::find_if(begin(myVec), end(myVec),
                           [&str](const MyData& data){return data.key == str;});
    if (it == end(myVec))
        return -1;
    return static_cast<int>(it - begin(myVec));
}

int main(int argc, const char * argv[]) {

    const int testInstance = 10000; //HOW MANY TIMES TO PERFORM THE TEST

    //----------------------------std::map-------------------------------
    clock_t map_cputime = std::clock(); //START MEASURING THE CPU TIME

    for (int i=0; i<testInstance; ++i) {

        map<string, int> myMap;

        //insert unique keys
        for (int i=0; i<myStrings.size(); ++i) {

            myMap[myStrings[i]] = i;
        }
        //iterate again, if key exists, replace value;
        for (int i=0; i<myStrings.size(); ++i) {

            if (myMap.find(myStrings[i]) != myMap.end())
                myMap[myStrings[i]] = i * 100;
        }
    }
    //FINISH MEASURING THE CPU TIME
    double map_cpu = (std::clock() - map_cputime) / (double)CLOCKS_PER_SEC;
    cout << "Map Finished in " << map_cpu << " seconds [CPU Clock] " << endl;


    //----------------------------std::vector-------------------------------
    clock_t vec_cputime = std::clock(); //START MEASURING THE CPU TIME

    for (int i=0; i<testInstance; ++i) {

        vector<MyData> myVec;

        //insert unique keys
        for (int i=0; i<myStrings.size(); ++i) {

            const int pos = findStringPosFromVec(myVec, myStrings[i]);

            if (pos == -1)
                myVec.push_back({myStrings[i], i});
        }
        //iterate again, if key exists, replace value;
        for (int i=0; i<myStrings.size(); ++i) {

            const int pos = findStringPosFromVec(myVec, myStrings[i]);

            if (pos != -1)
                myVec[pos].value = i * 100;
        }
    }
    //FINISH MEASURING THE CPU TIME
    double vec_cpu = (std::clock() - vec_cputime) / (double)CLOCKS_PER_SEC;
    cout << "Vector Finished in " << vec_cpu << " seconds [CPU Clock] " << endl;
    return 0;
}

And this is the result I got. 这就是我得到的结果。

Map Finished in 0.38121 seconds [CPU Clock] 
Vector Finished in 0.346863 seconds [CPU Clock] 
Program ended with exit code: 0

I mostly store less than 30 elements in a container. 我通常在一个容器中存储少于30个元素。

Does this mean it is better to use std::vector instead of std::map in my case? 这是否意味着在我的情况下最好使用std :: vector而不是std :: map?

EDIT: when I move map<string, int> myMap; 编辑:当我移动map<string, int> myMap; before the loop, std::map was faster than std::vector. 在循环之前,std :: map比std :: vector更快。

Map Finished in 0.278136 seconds [CPU Clock] 
Vector Finished in 0.328548 seconds [CPU Clock] 
Program ended with exit code: 0

So If this is the proper test, I guess std::map is faster. 因此,如果这是正确的测试,我想std :: map会更快。

But, If I reduce the amount of elements to 10, std::vector was faster so I guess it really depends on the number of elements. 但是,如果我将元素数量减少到10,则std :: vector会更快,因此我猜它实际上取决于元素的数量。

I would say that in general, it's possible that a vector performs better than a map for lookups, but for a tiny amount of data only, eg you've mentioned less than 30 elements. 我要说的是,一般而言,矢量在查找方面可能比映射更好,但仅用于少量数据,例如,您提到的元素少于30个。

The reason is that linear search through continuous memory chunk is the cheapest way to access memory. 原因是通过连续内存块进行线性搜索是访问内存的最便宜方法。 A map keeps data at random memory locations, so it's a little bit more expensive to access them. 映射将数据保存在随机的内存位置,因此访问它们的开销会更高一些。 In case of a tiny number of elements, this might play a role. 在元素数量很少的情况下,这可能会起作用。 In real life with hundreds and thousands of elements, algorithmic complexity of a lookup operation will dominate this performance gain. 在具有成千上万个元素的现实生活中,查找操作的算法复杂性将主导这种性能提升。

BUT! 但! You are benchmarking completely different things: 您正在对完全不同的基准进行基准测试:

  1. You are populating a map. 您正在填充地图。 In case of a vector, you don't do this 如果是向量,则不要这样做
  2. Your code could perform TWO map lookups: first, find to check existence, second [] operator to find an element to modify. 您的代码可以执行两次映射查找:首先, 查找以检查是否存在,第二个[]运算符以查找要​​修改的元素。 These are relatively heavy operations. 这些是相对繁重的操作。 You can modify an element just with single find (figure this out yourself, check references!) 您可以只用一次查找就修改一个元素(自己弄清楚,检查引用!)
  3. Within each test iteration, you are performing additional heavy operations , like memory allocation for each map/vector. 在每个测试迭代中,您将执行其他繁重的操作 ,例如为每个映射/向量分配内存。 It means that your tests are measuring not only lookup performance but something else. 这意味着您的测试不仅在衡量查找性能,而且还在衡量其他内容。
  4. Benchmarking is a difficult problem, don't do this yourself. 基准测试是一个困难的问题,请不要自己做。 For example, there are side effects like cache heating and you have to deal with them. 例如,有一些副作用,例如高速缓存加热,您必须对其进行处理。 Use something like Celero , hayai or google benchmark 使用诸如CeleroHayaiGoogle基准之类的东西

Your vector has constant content, so the compiler optimizes most of your code away anyway. 向量具有恒定的内容,因此无论如何编译器都会优化大多数代码。
There is little use in measuring for such small counts, and no use measuring for hard coded values. 测量这么小的计数几乎没有用,测量硬编码值也没有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM