In array of numbers, each number appears even number of times, and only one number appears odd number of times. We need to find that number (question was previously discussed on Stack Overflow ).
Here is a solution that solves the question with 3 different methods — two methods are O(N) (hash_set and hash_map), while one is O(NlogN) (sorting). However, profiling for arbitrarily large input shows that sorting is faster, and gets more and more faster (in comparison) as input increases.
What is wrong with implementation or complexity analysis, why is O(NlogN) method faster?
#include <algorithm>
#include <chrono>
#include <cmath>
#include <iostream>
#include <functional>
#include <string>
#include <vector>
#include <unordered_set>
#include <unordered_map>
using std::cout;
using std::chrono::high_resolution_clock;
using std::chrono::milliseconds;
using std::endl;
using std::string;
using std::vector;
using std::unordered_map;
using std::unordered_set;
class ScopedTimer {
public:
ScopedTimer(const string& name)
: name_(name), start_time_(high_resolution_clock::now()) {}
~ScopedTimer() {
cout << name_ << " took "
<< std::chrono::duration_cast<milliseconds>(
high_resolution_clock::now() - start_time_).count()
<< " milliseconds" << endl;
}
private:
const string name_;
const high_resolution_clock::time_point start_time_;
};
int find_using_hash(const vector<int>& input_data) {
unordered_set<int> numbers(input_data.size());
for(const auto& value : input_data) {
auto res = numbers.insert(value);
if(!res.second) {
numbers.erase(res.first);
}
}
return numbers.size() == 1 ? *numbers.begin() : -1;
}
int find_using_hashmap(const vector<int>& input_data) {
unordered_map<int,int> counter_map;
for(const auto& value : input_data) {
++counter_map[value];
}
for(const auto& map_entry : counter_map) {
if(map_entry.second % 2 == 1) {
return map_entry.first;
}
}
return -1;
}
int find_using_sort_and_count(const vector<int>& input_data) {
vector<int> local_copy(input_data);
std::sort(local_copy.begin(), local_copy.end());
int prev_value = local_copy.front();
int counter = 0;
for(const auto& value : local_copy) {
if(prev_value == value) {
++counter;
continue;
}
if(counter % 2 == 1) {
return prev_value;
}
prev_value = value;
counter = 1;
}
return counter == 1 ? prev_value : -1;
}
void execute_and_time(const string& method_name, std::function<int()> method) {
ScopedTimer timer(method_name);
cout << method_name << " returns " << method() << endl;
}
int main()
{
vector<int> input_size_vec({1<<18,1<<20,1<<22,1<<24,1<<28});
for(const auto& input_size : input_size_vec) {
// Prepare input data
std::vector<int> input_data;
const int magic_number = 123454321;
for(int i=0;i<input_size;++i) {
input_data.push_back(i);
input_data.push_back(i);
}
input_data.push_back(magic_number);
std::random_shuffle(input_data.begin(), input_data.end());
cout << "For input_size " << input_size << ":" << endl;
execute_and_time("hash-set:",std::bind(find_using_hash, input_data));
execute_and_time("sort-and-count:",std::bind(find_using_sort_and_count, input_data));
execute_and_time("hash-map:",std::bind(find_using_hashmap, input_data));
cout << "--------------------------" << endl;
}
return 0;
}
Profiling results:
sh$ g++ -O3 -std=c++11 -o main *.cc
sh$ ./main
For input_size 262144:
hash-set: returns 123454321
hash-set: took 107 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 37 milliseconds
hash-map: returns 123454321
hash-map: took 109 milliseconds
--------------------------
For input_size 1048576:
hash-set: returns 123454321
hash-set: took 641 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 173 milliseconds
hash-map: returns 123454321
hash-map: took 731 milliseconds
--------------------------
For input_size 4194304:
hash-set: returns 123454321
hash-set: took 3250 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 745 milliseconds
hash-map: returns 123454321
hash-map: took 3631 milliseconds
--------------------------
For input_size 16777216:
hash-set: returns 123454321
hash-set: took 14528 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 3238 milliseconds
hash-map: returns 123454321
hash-map: took 16483 milliseconds
--------------------------
For input_size 268435456:
hash-set: returns 123454321
hash-set: took 350305 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 60396 milliseconds
hash-map: returns 123454321
hash-map: took 427841 milliseconds
--------------------------
Addition
Fast solution with xor suggested by @Matt is of course out of contest — under 1 sec for worst case in example:
int find_using_xor(const vector<int>& input_data) {
int output = 0;
for(const int& value : input_data) {
output = output^value;
}
return output;
}
For input_size 268435456:
xor: returns 123454321
xor: took 264 milliseconds
but the question still stands — why is hash so inefficient compared to sorting in practice despite theoretical algorithmic complexity advantage?
It really depends on hash_map/hash_set implementation. By replacing libstdc++'s unordered_{map,set}
with Google's dense_hash_{map,set}
, and it is significantly faster than the sort
. The drawback for dense_hash_xxx
is that they require there are two values for key that will never be used. See their document for details.
Another thing to remember is: hash_{map,set}
usually does a lot of dynamic memory allocation/deallocation, so it is better to use a better alternative to libc's default malloc/free
, eg Google's tcmalloc
or Facebook's jemalloc
.
hidden $ g++ -O3 -std=c++11 xx.cpp /usr/lib/libtcmalloc_minimal.so.4
hidden $ ./a.out
For input_size 262144:
unordered-set: returns 123454321
unordered-set: took 35 milliseconds
dense-hash-set: returns 123454321
dense-hash-set: took 18 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 34 milliseconds
unordered-map: returns 123454321
unordered-map: took 36 milliseconds
dense-hash-map: returns 123454321
dense-hash-map: took 13 milliseconds
--------------------------
For input_size 1048576:
unordered-set: returns 123454321
unordered-set: took 251 milliseconds
dense-hash-set: returns 123454321
dense-hash-set: took 77 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 153 milliseconds
unordered-map: returns 123454321
unordered-map: took 220 milliseconds
dense-hash-map: returns 123454321
dense-hash-map: took 60 milliseconds
--------------------------
For input_size 4194304:
unordered-set: returns 123454321
unordered-set: took 1453 milliseconds
dense-hash-set: returns 123454321
dense-hash-set: took 357 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 596 milliseconds
unordered-map: returns 123454321
unordered-map: took 1461 milliseconds
dense-hash-map: returns 123454321
dense-hash-map: took 296 milliseconds
--------------------------
For input_size 16777216:
unordered-set: returns 123454321
unordered-set: took 6664 milliseconds
dense-hash-set: returns 123454321
dense-hash-set: took 1751 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 2513 milliseconds
unordered-map: returns 123454321
unordered-map: took 7299 milliseconds
dense-hash-map: returns 123454321
dense-hash-map: took 1364 milliseconds
--------------------------
tcmalloc: large alloc 1073741824 bytes == 0x5f392000 @
tcmalloc: large alloc 2147483648 bytes == 0x9f392000 @
tcmalloc: large alloc 4294967296 bytes == 0x11f392000 @
For input_size 268435456:
tcmalloc: large alloc 4586348544 bytes == 0x21fb92000 @
unordered-set: returns 123454321
unordered-set: took 136271 milliseconds
tcmalloc: large alloc 8589934592 bytes == 0x331974000 @
tcmalloc: large alloc 2147483648 bytes == 0x21fb92000 @
dense-hash-set: returns 123454321
dense-hash-set: took 34641 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 47606 milliseconds
tcmalloc: large alloc 2443452416 bytes == 0x21fb92000 @
unordered-map: returns 123454321
unordered-map: took 176066 milliseconds
tcmalloc: large alloc 4294967296 bytes == 0x331974000 @
dense-hash-map: returns 123454321
dense-hash-map: took 26460 milliseconds
--------------------------
Code:
#include <algorithm>
#include <chrono>
#include <cmath>
#include <iostream>
#include <functional>
#include <string>
#include <vector>
#include <unordered_set>
#include <unordered_map>
#include <google/dense_hash_map>
#include <google/dense_hash_set>
using std::cout;
using std::chrono::high_resolution_clock;
using std::chrono::milliseconds;
using std::endl;
using std::string;
using std::vector;
using std::unordered_map;
using std::unordered_set;
using google::dense_hash_map;
using google::dense_hash_set;
class ScopedTimer {
public:
ScopedTimer(const string& name)
: name_(name), start_time_(high_resolution_clock::now()) {}
~ScopedTimer() {
cout << name_ << " took "
<< std::chrono::duration_cast<milliseconds>(
high_resolution_clock::now() - start_time_).count()
<< " milliseconds" << endl;
}
private:
const string name_;
const high_resolution_clock::time_point start_time_;
};
int find_using_unordered_set(const vector<int>& input_data) {
unordered_set<int> numbers(input_data.size());
for(const auto& value : input_data) {
auto res = numbers.insert(value);
if(!res.second) {
numbers.erase(res.first);
}
}
return numbers.size() == 1 ? *numbers.begin() : -1;
}
int find_using_unordered_map(const vector<int>& input_data) {
unordered_map<int,int> counter_map;
for(const auto& value : input_data) {
++counter_map[value];
}
for(const auto& map_entry : counter_map) {
if(map_entry.second % 2 == 1) {
return map_entry.first;
}
}
return -1;
}
int find_using_dense_hash_set(const vector<int>& input_data) {
dense_hash_set<int> numbers(input_data.size());
numbers.set_deleted_key(-1);
numbers.set_empty_key(-2);
for(const auto& value : input_data) {
auto res = numbers.insert(value);
if(!res.second) {
numbers.erase(res.first);
}
}
return numbers.size() == 1 ? *numbers.begin() : -1;
}
int find_using_dense_hash_map(const vector<int>& input_data) {
dense_hash_map<int,int> counter_map;
counter_map.set_deleted_key(-1);
counter_map.set_empty_key(-2);
for(const auto& value : input_data) {
++counter_map[value];
}
for(const auto& map_entry : counter_map) {
if(map_entry.second % 2 == 1) {
return map_entry.first;
}
}
return -1;
}
int find_using_sort_and_count(const vector<int>& input_data) {
vector<int> local_copy(input_data);
std::sort(local_copy.begin(), local_copy.end());
int prev_value = local_copy.front();
int counter = 0;
for(const auto& value : local_copy) {
if(prev_value == value) {
++counter;
continue;
}
if(counter % 2 == 1) {
return prev_value;
}
prev_value = value;
counter = 1;
}
return counter == 1 ? prev_value : -1;
}
void execute_and_time(const string& method_name, std::function<int()> method) {
ScopedTimer timer(method_name);
cout << method_name << " returns " << method() << endl;
}
int main()
{
vector<int> input_size_vec({1<<18,1<<20,1<<22,1<<24,1<<28});
for(const auto& input_size : input_size_vec) {
// Prepare input data
std::vector<int> input_data;
const int magic_number = 123454321;
for(int i=0;i<input_size;++i) {
input_data.push_back(i);
input_data.push_back(i);
}
input_data.push_back(magic_number);
std::random_shuffle(input_data.begin(), input_data.end());
cout << "For input_size " << input_size << ":" << endl;
execute_and_time("unordered-set:",std::bind(find_using_unordered_set, std::cref(input_data)));
execute_and_time("dense-hash-set:",std::bind(find_using_dense_hash_set, std::cref(input_data)));
execute_and_time("sort-and-count:",std::bind(find_using_sort_and_count, std::cref(input_data)));
execute_and_time("unordered-map:",std::bind(find_using_unordered_map, std::cref(input_data)));
execute_and_time("dense-hash-map:",std::bind(find_using_dense_hash_map, std::cref(input_data)));
cout << "--------------------------" << endl;
}
return 0;
}
This analysis is substantially the same as that done by user3386199 in his answer . It is the analysis I would have performed regardless of his answer — but he did get there first.
I ran the program on my machine (HP Z420 running an Ubuntu 14.04 LTE derivative), and added output for 1<<26
, so I have a different set of numbers, but the ratios look remarkably similar to the ratios from the data in the original post. The raw times I got were (file on-vs-logn.raw.data
):
For input_size 262144:
hash-set: returns 123454321
hash-set: took 45 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 34 milliseconds
hash-map: returns 123454321
hash-map: took 61 milliseconds
--------------------------
For input_size 1048576:
hash-set: returns 123454321
hash-set: took 372 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 154 milliseconds
hash-map: returns 123454321
hash-map: took 390 milliseconds
--------------------------
For input_size 4194304:
hash-set: returns 123454321
hash-set: took 1921 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 680 milliseconds
hash-map: returns 123454321
hash-map: took 1834 milliseconds
--------------------------
For input_size 16777216:
hash-set: returns 123454321
hash-set: took 8356 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 2970 milliseconds
hash-map: returns 123454321
hash-map: took 9045 milliseconds
--------------------------
For input_size 67108864:
hash-set: returns 123454321
hash-set: took 37582 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 12842 milliseconds
hash-map: returns 123454321
hash-map: took 46480 milliseconds
--------------------------
For input_size 268435456:
hash-set: returns 123454321
hash-set: took 172329 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 53856 milliseconds
hash-map: returns 123454321
hash-map: took 211191 milliseconds
--------------------------
real 11m32.852s
user 11m24.687s
sys 0m8.035s
I created a script, awk.analysis.sh
, to analyze the data:
#!/bin/sh
awk '
BEGIN { printf("%9s %8s %8s %8s %8s %8s %8s %9s %9s %9s %9s\n",
"Size", "Sort Cnt", "R:Sort-C", "Hash Set", "R:Hash-S", "Hash Map",
"R:Hash-M", "O(N)", "O(NlogN)", "O(N^3/2)", "O(N^2)")
}
/input_size/ { if (old_size == 0) old_size = $3; size = $3 }
/hash-set: took/ { if (o_hash_set == 0) o_hash_set = $3; t_hash_set = $3 }
/sort-and-count: took/ { if (o_sort_cnt == 0) o_sort_cnt = $3; t_sort_cnt = $3 }
/hash-map: took/ { if (o_hash_map == 0) o_hash_map = $3; t_hash_map = $3 }
/^----/ {
o_n = size / old_size
o_nlogn = (size * log(size)) / (old_size * log(old_size))
o_n2 = (size * size) / (old_size * old_size)
o_n32 = (size * sqrt(size)) / (old_size * sqrt(old_size))
r_sort_cnt = t_sort_cnt / o_sort_cnt
r_hash_map = t_hash_map / o_hash_map
r_hash_set = t_hash_set / o_hash_set
printf("%9d %8d %8.2f %8d %8.2f %8d %8.2f %9.0f %9.2f %9.2f %9.0f\n",
size, t_sort_cnt, r_sort_cnt, t_hash_set, r_hash_set,
t_hash_map, r_hash_map, o_n, o_nlogn, o_n32, o_n2)
}' < on-vs-logn.raw.data
The output from the program is quite wide, but gives:
Size Sort Cnt R:Sort-C Hash Set R:Hash-S Hash Map R:Hash-M O(N) O(NlogN) O(N^3/2) O(N^2)
262144 34 1.00 45 1.00 61 1.00 1 1.00 1.00 1
1048576 154 4.53 372 8.27 390 6.39 4 4.44 8.00 16
4194304 680 20.00 1921 42.69 1834 30.07 16 19.56 64.00 256
16777216 2970 87.35 8356 185.69 9045 148.28 64 85.33 512.00 4096
67108864 12842 377.71 37582 835.16 46480 761.97 256 369.78 4096.00 65536
268435456 53856 1584.00 172329 3829.53 211191 3462.15 1024 1592.89 32768.00 1048576
It is reasonably clear that on this platform, the hash set and hash map algorithms are not O(N), nor are they as good as O(N.logN), but they are better than O(N 3/2 ) let alone O(N 2 ). On the other hand, the sorting algorithm is very close to O(N.logN) indeed.
You can only put that down to a theoretical deficiency in the hash set and hash map code, or an inadequate sizing of the hash tables so that they are using a sub-optimal hash table size. It would be worth investigating what mechanisms exist to pre-size the hash set and hash map to see whether using that affects the performance. (See also extra information below.)
And, just for the record, here's the output from the analysis script on the original data:
Size Sort Cnt R:Sort-C Hash Set R:Hash-S Hash Map R:Hash-M O(N) O(NlogN) O(N^3/2) O(N^2)
262144 37 1.00 107 1.00 109 1.00 1 1.00 1.00 1
1048576 173 4.68 641 5.99 731 6.71 4 4.44 8.00 16
4194304 745 20.14 3250 30.37 3631 33.31 16 19.56 64.00 256
16777216 3238 87.51 14528 135.78 16483 151.22 64 85.33 512.00 4096
268435456 60396 1632.32 350305 3273.88 427841 3925.15 1024 1592.89 32768.00 1048576
Further testing shows that modifying the hash functions as shown:
int find_using_hash(const vector<int>& input_data) {
unordered_set<int> numbers;
numbers.reserve(input_data.size());
and:
int find_using_hashmap(const vector<int>& input_data) {
unordered_map<int,int> counter_map;
counter_map.reserve(input_data.size());
produces an analysis like this:
Size Sort Cnt R:Sort-C Hash Set R:Hash-S Hash Map R:Hash-M O(N) O(NlogN) O(N^3/2) O(N^2)
262144 34 1.00 42 1.00 80 1.00 1 1.00 1.00 1
1048576 155 4.56 398 9.48 321 4.01 4 4.44 8.00 16
4194304 685 20.15 1936 46.10 1177 14.71 16 19.56 64.00 256
16777216 2996 88.12 8539 203.31 5985 74.81 64 85.33 512.00 4096
67108864 12564 369.53 37612 895.52 28808 360.10 256 369.78 4096.00 65536
268435456 53291 1567.38 172808 4114.48 124593 1557.41 1024 1592.89 32768.00 1048576
Clearly, reserving the space for the hash map is beneficial.
The hash set code is rather different; it adds an item about half the time (overall), and 'adds' and then deletes an item the other half of the time. This is more work than the hash map code has to do, so it is slower. This also means that the reserved space is larger than really necessary, and may account for the degraded performance with the reserved space.
Let's start by looking at the numbers for the sorting solution. In the table below, the first column is the size ratio. It's computed by calculating NlogN for a given test, and dividing by NlogN for the first test. The second column is the time ratio between a given test and the first test.
NlogN size ratio time ratio
4*20/18 = 4.4 173 / 37 = 4.7
16*22/18 = 19.6 745 / 37 = 20.1
64*24/18 = 85.3 3238 / 37 = 87.5
1024*28/18 = 1590 60396 / 37 = 1630
You can see that there is very good agreement between the two ratios, indicating that the sort routine is indeed O(NlogN) .
So why are the hash routines not performing as expected. Simple, the notion that extracting an item from a hash table is O(1) is pure fantasy. The actual extraction time depends on the quality of the hashing function, and the number of bins in the hash table. The actual extraction time ranges from O(1) to O(N) , where the worst case occurs when all of the entries in the hash table end up in the same bin. So using a hash table, you should expect your performance to be somewhere between O(N) and O(N^2) which seems to fit your data, as shown below
O(N) O(NlogN) O(N^2) time
4 4.4 16 6
16 20 256 30
64 85 4096 136
1024 1590 10^6 3274
Note that the time ratio is at the low end of the range, indicating that the hash function is working fairly well.
I ran the program through valgrind with different input sizes, and I got these results for cycle counts:
with 1<<16 values:
find_using_hash: 27 560 872
find_using_sort: 17 089 994
sort/hash: 62.0%
with 1<<17 values:
find_using_hash: 55 105 370
find_using_sort: 35 325 606
sort/hash: 64.1%
with 1<<18 values:
find_using_hash: 110 235 327
find_using_sort: 75 695 062
sort/hash: 68.6%
with 1<<19 values:
find_using_hash: 220 248 209
find_using_sort: 157 934 801
sort/hash: 71.7%
with 1<<20 values:
find_using_hash: 440 551 113
find_using_sort: 326 027 778
sort/hash: 74.0%
with 1<<21 values:
find_using_hash: 881 086 601
find_using_sort: 680 868 836
sort/hash: 77.2%
with 1<<22 values:
find_using_hash: 1 762 482 400
find_using_sort: 1 420 801 591
sort/hash: 80.6%
with 1<<23 values:
find_using_hash: 3 525 860 455
find_using_sort: 2 956 962 786
sort/hash: 83.8%
This indicates that the sort time is slowly overtaking the hash time, at least theoretically. With my particular compiler/library (gcc 4.8.2/libsddc++), and optimization (-O2), the sort and hash methods would be the same speed at around 2^28 values, which is at the limit of what you are trying. I suspect that other system factors are coming into play when using that much memory, which is making it difficult to evaluate in actual wall time.
The fact that O(N)
was seemingly slower than O(N logN)
was driving me crazy, so I decided to dive deep into the problem.
I did this analysis in Windows with Visual Studio, but I bet the results would be very similar on Linux with g++.
First of all, I used Very Sleepy to find the pieces of code that where being executed the most during the for
loop in find_using_hash()
. This is what I saw:
As you can see, the top entries are all related to lists ( RtlAllocateHeap
is called from lists code). Apparently, the problem is that for each insertion in the unordered_set
and since buckets are implemented as lists, an allocation for a node is made and this sky-rockets the duration of the algorithm, as opposed to the sort which makes no allocations.
To be sure this was the problem, I wrote a VERY simple implementation of a hash table without allocations, and the results were far more reasonable:
So there it is, the factor log N
multiplying N
which in your largest example (ie 1<<28
) is 28, is still smaller than the "constant" amount of work required for an allocation.
There are many great answers here already, but this is the special kind of question which naturally generates many valid answers.
And I'm writing to provide an answer from a mathematical perspective (which is hard to do without LaTeX), because it is important to correct the unaddressed misconception that solving the given problem with hashes represents a problem that is "theoretically" O(n)
, yet somehow "practically" worse than O(n)
. Such a thing would be a mathematical impossibility!
For those wishing to pursue the topic in more depth, I recommend this book which I saved for and bought as a very poor high school student, and which stoked my interest in applied mathematics for many years to come, essentially changing the outcome of my life: http://www.amazon.com/Analysis-Algorithms-Monographs-Computer-Science/dp/0387976876
To understand why the problem is not "theoretically" O(n)
, it is necessary to note that the underlying assumption is also false: it is not true that hashes are "theoretically" an O(1)
data structure.
The opposite is actually true. Hashes, in their pure form, are only "practically" an O(1)
data structure, but theoretically still are an O(n)
data structure. (Note: In hybrid form, they can achieve theoretical O(log n)
performance.)
Therefore, the solution is still, in the best case, an O(n log n)
problem, as n
approaches infinity.
You may start to respond, but everyone knows that hashes are O(1)!
So now let me explain how that claim is true, but in the practical, not theoretical, sense .
For any application (regardless of n
, so long as n
is known ahead of time—what they call "fixed" rather than "arbitrary" in mathematical proofs), you can design your hash table to match the application, and obtain O(1)
performance within the constraints of that environment. Each pure hashing structure is intended to perform well within an a priori range of data set sizes and with the assumed independence of keys with respect to the hashing function.
But when you let n
approach infinity, as required by the definition of Big- O
notation, then the buckets begin to fill (which must happen by the pigeonhole principle), and any pure hash structure breaks down into an O(n)
algorithm (the Big- O
notation here ignores the constant factor that depends on how many buckets there are).
Whoa! There's a lot in that sentence.
And so at this point, rather than equations, an appropriate analogy would be more helpful:
A very accurate mathematical understanding of hash tables is gained by imagining a filing cabinet containing 26 drawers, one for each letter of the alphabet. Each file is stored within the drawer that corresponds to the first letter in the file's name.
The "hash function" is an O(1)
operation, looking at the first letter.
Storage is an O(1)
operation: placing the file inside the drawer for that letter.
And as long as there are not more than one file inside each drawer , retrieval is an O(1)
operation: opening the drawer for that letter.
Within these design constraints, this hash structure is O(1)
.
Now suppose that you exceed the design constraints for this "filing cabinet" hashing structure, and have stored several hundred files. Storage now takes as many operations as needed to find an empty space in each drawer, and retrieval takes as many operations as the number of items within each drawer.
Compared to throwing all the files into a single huge pile, the average performance overall is approximately better by a factor of 1/26th as much time. But remember, mathematically, one cannot say O(n/26)
, because O(n)
notation by definition does not take into consideration constant factors which affect performance, but only algorithmic complexity as a function of n
. So when the design constraints are exceeded, the data structure is O(n)
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.