I know a classic programming interview question is " Given an array of N-1 integers which are numbers 1 through N with one of them missing, find the missing number ." I'm thinking that
int missing_number ( int * arr, int n )
{
std::vector<bool> booVec(n, false);
int * offArrEnd = arr + n;
while (arr != offArrEnd) booVec[*arr++] = true;
return std::find_first_of(booVec.begin(), booVec.end(), false)
- booVec.begin() + 1;
}
would be a good solution since instantiating a vector<bool>
element to all false
will take a short amount of time, and so will modifying its elements via booVec[*arr++]
. I know I could save 1 operation by changing it to
int missing_number ( int * arr, int n )
{
std::vector<bool> booVec(n, false);
int * offArrEnd = arr + n;
while (arr != offArrEnd) booVec[*arr++] = true;
std::vector<bool>::iterator offBooEnd = booVec.end();
return std::find_first_of(booVec.begin(), offBooEnd, false)
- offBooEnd + 1;
}
But I'm wondering if using a similar procedure with unordered_map
might be faster overall? I presume it would take longer to instantiate every member of an unordered_map
, but it might take faster to modify its elements.
The technique you used above is the basis of Pigeonhole-Sort , with an additional guarantee of no duplicates making it even more efficient.
Thus, the algorithm is O(n) (tight bound).
A std::unordered_set
has O(1) expected and O(n) worst case complexity for each of the N-1 insertions though, for a total of O(n) expected and O(n*n) worst case.
Even though the complexity in the expected (and best) case is equal, std::unordered_set
is a far more complex container and thus looses the race in any case.
std::vector<bool>
does not contain any bool
, but is a specialization using proxies to save space (Widely regarded as a design-failure)!
Thus, using a different instantiation of vector
, with char
or even int
will consume more modifiable memory, but might due to more efficient code (no bit-twiddling) be more efficient.
Anyway, both implementations efficiency is dwarfed by simply adding the elements and subtracting the sum from what it would be for an uninterrupted sequence, like Nikola Dimitroff comments .
int missing_number ( int * arr, int n )
{
unsigned long long r = (unsigned long long)n * (n+1) / 2;
for(n--)
r -= arr[n];
return (int)r;
}
vector
in this case where n
is bounded should be able to beat unordered_map
. The underlying data structure for unordered_map
is essentially a vector
, where a hash is taken, and the modulus of the hash is taken to choose the index to start at in the vector
. (The vector
stores the hash table "buckets") As a result, a plain vector
is already a perfect hash table and you have a perfect hash -- N from the array! Therefore, the extra mechanism provided by unordered_map
is going to be overhead you're not using.
(And that's assuming you don't happen to fall into the case where unordered_map
can have O(n) lookup complexity due to hash collisions)
That said, vector<char>
may beat vector<bool>
due to the bitfield behavior of vector<bool>
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.