简体   繁体   中英

How to calculate difference between two sets in C?

I have two arrays, say A and B with |A|=8 and |B|=4. I want to calculate the set difference AB. How do I proceed? Please note that there are no repeated elements in either of the sets.

Edit: Thank you so much everybody for a myriad of elegant solutions. Since I am in prototyping stage of my project, for now I implemented the simplest solution told by Brian and Owen. But I do appreciate the clever use of data structures as suggested here by the rest of you, even Though I am not a computer scientist but an engineer and never studied data structures as a course. Looks like it's about time I should really start reading CLRS which I have been procrastinating for quite a while :) Thanks again!

sort arrays A and B
result will be in C
let a - the first elem of A
let b - the first elem of B
then:
1) while a < b: insert a into C and a = next elem of A
2) while a > b: b = next elem of B
3) if a = b: a = next elem of A and b = next elem of B
4) if b goes to end: insert rest of A into C and stop
5) if a goes to end: stop

遍历A的每个元素,如果每个元素不在B中,则将它们添加到新的集合C中。

It depends on how you want to represent your sets, but if they are just packed bits then you can use bitwise operators, eg D = A & ~B; would give you the set difference AB if the sets fit into an integer type. For larger sets you might use arrays of integer types and iterate, eg

for (i = 0; i < N; ++i)
{
    D[i] = A[i] & ~B[i];
}

The following assumes the sets are stored as a sorted container (as std::set does).

There's a common algorithm for merging two ordered lists to produce a third. The idea is that when you look at the heads of the two lists, you can determine which is the lower, extract that, and add it to the tail of the output, then repeat.

There are variants which detect the case where the two heads are equal, and treat this specially. Set intersections and unions are examples of this.

With a set asymmetric difference, the key point is that for AB, when you extract the head of B, you discard it. When you extract the head of A, you add it to the input unless the head of B is equal, in which case you extract that too and discard both.

Although this approach is designed for sequential-access data structures (and tape storage etc), it's sometimes very useful to do the same thing for a random-access data structure so long as it's reasonably efficient to access it sequentially anyway. And you don't necessarily have to extract things for real - you can do copying and step instead.

The key point is that you step through the inputs sequentially, always looking at the lowest remaining value next, so that (if the inputs have no duplicates) you will the matched items. You therefore always know whether your next lowest value to handle is an item from A with no match in B, and item in B with no match in A, or an item that's equal in both A and B.

More generally, the algorithm for the set difference depends on the representation of the set. For example, if the set is represented as a bit-vector, the above would be overcomplex and slow - you'd just loop through the vectors doing bitwise operations. If the set is represented as a hashtable (as in the tr1 unordered_set) the above is wrong as it requires ordered inputs.

If you have your own binary tree code that you're using for the sets, one good option is to convert both trees into linked lists, work on the lists, then convert the resulting list to a perfectly balanced tree. The linked-list set-difference is very simple, and the two conversions are re-usable for other similar operations.

EDIT

On the complexity - using these ordered merge-like algorithms is O(n) provided you can do the in-order traversals in O(n). Converting to a list and back is also O(n) as each of the three steps is O(n) - tree-to-list, set-difference and list-to-tree.

Tree-to-list basically does a depth-first traversal, deconstructing the tree as it goes. Theres a trick for making this iterative, storing the "stack" in part-handled nodes - changing a left-child pointer into a parent-pointer just before you step to the left child. This is a good idea if the tree may be large and unbalanced.

Converting a list to a tree basically involves a depth-first traversal of an imaginary tree (based on the size, known from the start) building it for real as you go. If a tree has 5 nodes, for instance, you can say that the root will be node 3. You recurse to build a two-node left subtree, then grab the next item from the list for that root, then recurse to build a two-node right subtree.

The list-to-tree conversion shouldn't need to be implemented iteratively - recursive is fine as the result is always perfectly balanced. If you can't handle the log n recursion depth, you almost certainly can't handle the full tree anyway.

Implement a set object in C. You can do it using a hash table for the underlying storage. This is obviously a non trivial exercise, but a few Open Source solutions exist. Then you simply need to add all the elements of A and then iterate over B and remove any that are elements of your set.

The key point is to use the right data structure for the job.

对于较大的集,我建议对数字进行排序并通过模拟http://www.cplusplus.com/reference/algorithm/set_difference/上的代码来遍历它们,该代码将为O(N * logN),但由于集大小这么小,即使理论上O(N ^ 2)较慢,Brian给出的解决方案似乎也不错。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM