Which is more efficient in this scenario: std::vector<bool> or std::unordered_map<int>?

Question

I know a classic programming interview question is " Given an array of N-1 integers which are numbers 1 through N with one of them missing, find the missing number ." I'm thinking that

int missing_number ( int * arr, int n )
{
    std::vector<bool> booVec(n, false);
    int * offArrEnd = arr + n;
    while (arr != offArrEnd) booVec[*arr++] = true;
    return std::find_first_of(booVec.begin(), booVec.end(), false)
        - booVec.begin() + 1;       
}

would be a good solution since instantiating a vector<bool> element to all false will take a short amount of time, and so will modifying its elements via booVec[*arr++] . I know I could save 1 operation by changing it to

int missing_number ( int * arr, int n )
{
    std::vector<bool> booVec(n, false);
    int * offArrEnd = arr + n;
    while (arr != offArrEnd) booVec[*arr++] = true;
    std::vector<bool>::iterator offBooEnd = booVec.end();
    return std::find_first_of(booVec.begin(), offBooEnd, false)
        - offBooEnd + 1;       
}

But I'm wondering if using a similar procedure with unordered_map might be faster overall? I presume it would take longer to instantiate every member of an unordered_map , but it might take faster to modify its elements.

Answer 1

The technique you used above is the basis of Pigeonhole-Sort , with an additional guarantee of no duplicates making it even more efficient.
Thus, the algorithm is O(n) (tight bound).

A std::unordered_set has O(1) expected and O(n) worst case complexity for each of the N-1 insertions though, for a total of O(n) expected and O(n*n) worst case.
Even though the complexity in the expected (and best) case is equal, std::unordered_set is a far more complex container and thus looses the race in any case.

std::vector<bool> does not contain any bool , but is a specialization using proxies to save space (Widely regarded as a design-failure)!
Thus, using a different instantiation of vector , with char or even int will consume more modifiable memory, but might due to more efficient code (no bit-twiddling) be more efficient.

Anyway, both implementations efficiency is dwarfed by simply adding the elements and subtracting the sum from what it would be for an uninterrupted sequence, like Nikola Dimitroff comments .

int missing_number ( int * arr, int n )
{
    unsigned long long r = (unsigned long long)n * (n+1) / 2;
    for(n--)
        r -= arr[n];
    return (int)r;
}

Answer 2

vector in this case where n is bounded should be able to beat unordered_map . The underlying data structure for unordered_map is essentially a vector , where a hash is taken, and the modulus of the hash is taken to choose the index to start at in the vector . (The vector stores the hash table "buckets") As a result, a plain vector is already a perfect hash table and you have a perfect hash -- N from the array! Therefore, the extra mechanism provided by unordered_map is going to be overhead you're not using.

(And that's assuming you don't happen to fall into the case where unordered_map can have O(n) lookup complexity due to hash collisions)

That said, vector<char> may beat vector<bool> due to the bitfield behavior of vector<bool> .

Which is more efficient in this scenario: std::vector<bool> or std::unordered_map<int>?

Question

2 answers

solution1
1 2014-10-31 22:39:39

solution2
1 2014-10-31 22:47:20

Which is more efficient in this scenario: std::vector<bool> or std::unordered_map<int>?

Question

2 answers

solution1 1 2014-10-31 22:39:39

solution2 1 2014-10-31 22:47:20

solution1
1 2014-10-31 22:39:39

solution2
1 2014-10-31 22:47:20