简体   繁体   中英

std::vector is faster than std::map for a key lookup?

I've been using std::vector mostly and was wondering if I should use std::map for a key lookup to improve performance.

And here's my full test code.

#include <iostream>
#include <string>
#include <map>
#include <vector>
#include <ctime>
#include <chrono>

using namespace std;

vector<string> myStrings = {"aaa", "bbb", "ccc", "ddd", "eee", "fff", "ggg", "hhh", "iii", "jjj", "kkk", "lll", "mmm", "nnn", "ooo", "ppp", "qqq", "rrr", "sss", "ttt", "uuu", "vvv", "www", "xxx", "yyy", "zzz"};

struct MyData {

    string key;
    int value;
};

int findStringPosFromVec(const vector<MyData> &myVec, const string &str) {

    auto it = std::find_if(begin(myVec), end(myVec),
                           [&str](const MyData& data){return data.key == str;});
    if (it == end(myVec))
        return -1;
    return static_cast<int>(it - begin(myVec));
}

int main(int argc, const char * argv[]) {

    const int testInstance = 10000; //HOW MANY TIMES TO PERFORM THE TEST

    //----------------------------std::map-------------------------------
    clock_t map_cputime = std::clock(); //START MEASURING THE CPU TIME

    for (int i=0; i<testInstance; ++i) {

        map<string, int> myMap;

        //insert unique keys
        for (int i=0; i<myStrings.size(); ++i) {

            myMap[myStrings[i]] = i;
        }
        //iterate again, if key exists, replace value;
        for (int i=0; i<myStrings.size(); ++i) {

            if (myMap.find(myStrings[i]) != myMap.end())
                myMap[myStrings[i]] = i * 100;
        }
    }
    //FINISH MEASURING THE CPU TIME
    double map_cpu = (std::clock() - map_cputime) / (double)CLOCKS_PER_SEC;
    cout << "Map Finished in " << map_cpu << " seconds [CPU Clock] " << endl;


    //----------------------------std::vector-------------------------------
    clock_t vec_cputime = std::clock(); //START MEASURING THE CPU TIME

    for (int i=0; i<testInstance; ++i) {

        vector<MyData> myVec;

        //insert unique keys
        for (int i=0; i<myStrings.size(); ++i) {

            const int pos = findStringPosFromVec(myVec, myStrings[i]);

            if (pos == -1)
                myVec.push_back({myStrings[i], i});
        }
        //iterate again, if key exists, replace value;
        for (int i=0; i<myStrings.size(); ++i) {

            const int pos = findStringPosFromVec(myVec, myStrings[i]);

            if (pos != -1)
                myVec[pos].value = i * 100;
        }
    }
    //FINISH MEASURING THE CPU TIME
    double vec_cpu = (std::clock() - vec_cputime) / (double)CLOCKS_PER_SEC;
    cout << "Vector Finished in " << vec_cpu << " seconds [CPU Clock] " << endl;
    return 0;
}

And this is the result I got.

Map Finished in 0.38121 seconds [CPU Clock] 
Vector Finished in 0.346863 seconds [CPU Clock] 
Program ended with exit code: 0

I mostly store less than 30 elements in a container.

Does this mean it is better to use std::vector instead of std::map in my case?

EDIT: when I move map<string, int> myMap; before the loop, std::map was faster than std::vector.

Map Finished in 0.278136 seconds [CPU Clock] 
Vector Finished in 0.328548 seconds [CPU Clock] 
Program ended with exit code: 0

So If this is the proper test, I guess std::map is faster.

But, If I reduce the amount of elements to 10, std::vector was faster so I guess it really depends on the number of elements.

I would say that in general, it's possible that a vector performs better than a map for lookups, but for a tiny amount of data only, eg you've mentioned less than 30 elements.

The reason is that linear search through continuous memory chunk is the cheapest way to access memory. A map keeps data at random memory locations, so it's a little bit more expensive to access them. In case of a tiny number of elements, this might play a role. In real life with hundreds and thousands of elements, algorithmic complexity of a lookup operation will dominate this performance gain.

BUT! You are benchmarking completely different things:

  1. You are populating a map. In case of a vector, you don't do this
  2. Your code could perform TWO map lookups: first, find to check existence, second [] operator to find an element to modify. These are relatively heavy operations. You can modify an element just with single find (figure this out yourself, check references!)
  3. Within each test iteration, you are performing additional heavy operations , like memory allocation for each map/vector. It means that your tests are measuring not only lookup performance but something else.
  4. Benchmarking is a difficult problem, don't do this yourself. For example, there are side effects like cache heating and you have to deal with them. Use something like Celero , hayai or google benchmark

Your vector has constant content, so the compiler optimizes most of your code away anyway.
There is little use in measuring for such small counts, and no use measuring for hard coded values.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM