简体   繁体   中英

How do I trigger std::set's worst-case insert(), contains(), and remove() operations?

I understand that the std::set is likely some sort of tree. I want to trigger std::set's worst-case insert() , contains() , and remove() operations - which I expect will take O(log(n)) time. I do not want to implement my own tree - I want to use std::set specifically.

In the image below, I perform these operations on std::set and the operations appear to be constant-time, on average. For bonus points, can anyone explain why this is constant instead of O(log(n)) ?

Below is my code measuring runtimes:

cout << "\n ................. Comparison: std::set vs. SortedQuickSet ..................\n ";
cout << "\n   Operation | # Elements | Total SQSetSet Runtime | Total std::set Runtime";
cout << "\n ------------|------------|------------------------|-----------------------\n";

for (int i = 0; i < COMPARISON_SET_DOUBLINGS; i++)
{
    set<unsigned> standardSet;
    SortedQuickSet sortedQuickSet;

    // Compare "Add" operations
    time = clock();                         // Start the timer
    for (unsigned j = 0; j < pow(2, i) * COMPARISON_SET_INITIAL_SIZE; j++) standardSet.insert((RANDOMIZED_SET_SIZE)-(rand() % (RANDOMIZED_SET_SIZE)));
    standardSetRuntime = clock() - time;    // Stop the timer
    time = clock();                         // Start the timer
    for (unsigned j = 0; j < pow(2, i) * COMPARISON_SET_INITIAL_SIZE; j++) sortedQuickSet.Add((RANDOMIZED_SET_SIZE)-(rand() % (RANDOMIZED_SET_SIZE)));
    SortedQuickSetRuntime = clock() - time; // Stop the timer
    cout << "         Add |";
    for (int j = 0; j < 18 - to_string(pow(2, i) * COMPARISON_SET_INITIAL_SIZE).length(); j++) cout << " ";
    cout << pow(2, i) * COMPARISON_SET_INITIAL_SIZE << " |";
    for (int j = 0; j < 23 - to_string(SortedQuickSetRuntime).length(); j++) cout << " ";
    cout << SortedQuickSetRuntime << " | " << standardSetRuntime << "  \n";

    // Compare "Contains" operations
    time = clock();                         // Start the timer
    for (unsigned j = 0; j < pow(2, i) * COMPARISON_SET_INITIAL_SIZE; j++) standardSet.find((RANDOMIZED_SET_SIZE)-(rand() % (RANDOMIZED_SET_SIZE)));
    standardSetRuntime = clock() - time;    // Stop the timer
    time = clock();                         // Start the timer
    for (unsigned j = 0; j < pow(2, i) * COMPARISON_SET_INITIAL_SIZE; j++) sortedQuickSet.Contains((RANDOMIZED_SET_SIZE)-(rand() % (RANDOMIZED_SET_SIZE)));
    SortedQuickSetRuntime = clock() - time; // Stop the timer
    cout << "    Contains |";
    for (int j = 0; j < 18 - to_string(pow(2, i) * COMPARISON_SET_INITIAL_SIZE).length(); j++) cout << " ";
    cout << pow(2, i) * COMPARISON_SET_INITIAL_SIZE << " |";
    for (int j = 0; j < 23 - to_string(SortedQuickSetRuntime).length(); j++) cout << " ";
    cout << SortedQuickSetRuntime << " | " << standardSetRuntime << "  \n";

    //// Compare "Get Sorted" operations
    //standardSetRuntime = 0;
    //time = clock();                           // Start the timer
    //for (auto element : standardSet) { }
    //standardSetRuntime = clock() - time;  // Stop the timer
    //SortedQuickSetRuntime = 0;
    //time = clock();                           // Start the timer
    //for (auto element : sortedQuickSet.Elements()) { }
    //SortedQuickSetRuntime = clock() - time;   // Stop the timer
    //cout << "  Get Sorted |";
    //for (int j = 0; j < 18 - to_string(pow(2, i) * COMPARISON_SET_INITIAL_SIZE).length(); j++) cout << " ";
    //cout << pow(2, i) * COMPARISON_SET_INITIAL_SIZE << " |";
    //for (int j = 0; j < 23 - to_string(SortedQuickSetRuntime).length(); j++) cout << " ";
    //cout << SortedQuickSetRuntime << " | " << standardSetRuntime << "  \n";

    // Compare "Remove" operations
    time = clock();                         // Start the timer
    for (unsigned j = 0; j < pow(2, i) * COMPARISON_SET_INITIAL_SIZE; j++) standardSet.erase((RANDOMIZED_SET_SIZE)-(rand() % (RANDOMIZED_SET_SIZE)));
    standardSetRuntime = clock() - time;    // Stop the timer
    time = clock();                         // Start the timer
    for (unsigned j = 0; j < pow(2, i) * COMPARISON_SET_INITIAL_SIZE; j++) sortedQuickSet.Remove((RANDOMIZED_SET_SIZE)-(rand() % (RANDOMIZED_SET_SIZE)));
    SortedQuickSetRuntime = clock() - time; // Stop the timer
    cout << "      Remove |";
    for (int j = 0; j < 18 - to_string(pow(2, i) * COMPARISON_SET_INITIAL_SIZE).length(); j++) cout << " ";
    cout << pow(2, i) * COMPARISON_SET_INITIAL_SIZE << " |";
    for (int j = 0; j < 23 - to_string(SortedQuickSetRuntime).length(); j++) cout << " ";
    cout << SortedQuickSetRuntime << " | " << standardSetRuntime;
    cout << "\n ------------|------------|------------------------|-----------------------\n";
}
cout << "\n Conclusion: on average, operations on a SortedQuickSet take ~30% as long as\n those on an std::set.";
cout << " Both set types perform these operations in constant\n time, and SortedQuickSet appears to have less overhead.\n";
cout << "\n ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''\n\n ";

(note that you're evaluating the average runtimes with your tests, the worst case is impossible to find for an implementation dependant data structure like std::set )

I suspect that your pow and rand operations dominate your measurements:

time = clock();                         // Start the timer
for (unsigned j = 0; j < pow(2, i) * COMPARISON_SET_INITIAL_SIZE; j++) standardSet.insert((RANDOMIZED_SET_SIZE)-(rand() % (RANDOMIZED_SET_SIZE)));
standardSetRuntime = clock() - time;    // Stop the timer

should be

// determine test size
unsigned int N = (unsigned int)std::pow(2, i) * COMPARISON_SET_INITIAL_SIZE;
// build samples
std::vector<int> samples(N);
for (unsigned int j = 0; j < N; ++j)
    samples[j] = (RANDOMIZED_SET_SIZE)-(rand() % (RANDOMIZED_SET_SIZE));
for (unsigned int warmup = 0; warmup < 3; ++warmup) {
    // code warm-up (cache samples, cache instructions for insert)
    for (unsigned int j = 0; j < N; ++j)
        standardSet.insert(samples[j]);
    standardSet.clear();
}
// now measure
int* sample = &samples[0];
time = clock();                         // Start the timer
for (unsigned int j = 0; j < N; ++j)
    standardSet.insert(*sample++);
standardSetRuntime = clock() - time;    // Stop the timer

etc...

You'll probably note that the operations take nanoseconds now instead of milliseconds ( rand was the most expensive part of your test -> there's exactly N rands -> runtime was growing linearly).

Also note that due to runtimes being affected by the data that you're actually inserting you should use the same samples for both data structures and to be more precise you should generate multiple sample-arrays, do your measurements for both on each array and then generate your statistics from those combined results. Otherwise you might randomly run into a preferred situation for one data structure but not the other.

You should get something like this:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM