Efficient way of determining whether one vector is a subset of another or not?

Question

Given two sorted vectors consisting of unique values between 0 and some known 'n'. And size of one vector (set1) will always be greater than that of candidate vector set2.

Query: Is to determine whether given set2 is a subset of set1 or not?

Is their any better and efficient way of doing this apart from the following implementation in C++11?

#include <iostream>
#include <vector>


bool subSetCheck(std::vector<int> set1, std::vector<int> set2) {

    //Set1 & 2 are always sorted and contain only unique integers from 0 to some known 'n'
    //Set1 is always larger than Set2 in size

    std::vector<int>::iterator it1 = set1.begin();
    std::vector<int>::iterator it2 = set2.begin();
    bool subSet = true;
    for (; (it1 != set1.end()) && (it2 !=set2.end()) ;) {

        if ( *it1 == *it2) {++it1; ++it2;}
        else if( *it1 > *it2) ++it2;
        else ++it1;
    }

    if (it1 ==set1.end()) subSet = false;

    return subSet;
}

int main () {

    std::vector<int> set1{0,1,2,3,4};
    std::vector<int> set2{0,1,5};

    if (subSetCheck(set1,set2)) std::cout << "Yes, set2 is subset of set1." << std::endl;
    else std::cout << "No! set2 is not a subset of set1." << std::endl;

    return 0;
}

Answer 1

You can use std::includes :

std::vector<int> a{1,2,3,4,5};
std::vector<int> b{1,2,6};
std::cout << std::includes(a.begin(), a.end(), b.begin(), b.end()) << std::endl;

Answer 2

Yes, there are more efficient ways. The answer to your question depends on whether you assume that most of the time, the vector will be a subset, or not.

This is all assuming that there are no duplicate elements.

Let's look at it this way. If vec2 happens to be a subset of vec1, then verifying that will take O(vec1.size()), because you have to look at every element.

In that case, your implementation is already pretty close to optimal. You could improve by using binary search to find the first matching element in vec1, instead of linear search as you do now.

Once you found the element, there is really not much else that you can do rather than iterate through all elements and compare.

If, on the other hand, you assume that most of the time set2 is not a susbet of set1, you should follow a different approach.

The beginning is the same: use binary search to find the first element of set2 in set1.

Then, use binary search to find the last element of set2 in set1.

Then, check whether the size of the span matches the size of set2. If not, you can bail out right now.

Finally, if the size matches, do the element-by-element comparison.

Things get trickier if you have duplicate elements, and figuring out how exactly to do that is left as an exercise to the reader.

Efficient way of determining whether one vector is a subset of another or not?

Question

2 answers

solution1
5 2016-03-30 20:39:29

solution2
0 2016-03-30 20:44:20

Efficient way of determining whether one vector is a subset of another or not?

Question

2 answers

solution1 5 2016-03-30 20:39:29

solution2 0 2016-03-30 20:44:20

solution1
5 2016-03-30 20:39:29

solution2
0 2016-03-30 20:44:20