简体   繁体   中英

Efficient way to find the difference between two data sets

I have two copies of data, here 1 represents my volumes and 2 represent my issues. I have to compare COPY2 with COPY1 and find all the elements which are missing in COPY2 ( COPY1 will always be a superset and COPY2 can be equal or will always be a subset). Now, I have to get the missing volume and the issue in COPY2. Such that from the following figure(scenario) I get the result as : - 在此处输入图片说明

Missing files – 1-C, 1-D, 2-C, 2-C, 3-A, 3-B, 4,E.

Question-

  1. What data structure should I use to store the above values (volume and issue) in java?
  2. How should I implement this scenario in java in the most efficient manner to find the difference between these 2 copies?

I suggest a flat HashSet<VolumeIssue> . Each VolumeIssue instance corresponds to one categorized issue, such as 1-C .

In that case all you will need to find the difference is a call

copy1.removeAll(copy2);

What is left in copy1 are all the issues present in copy1 and missing from copy2 .

Note that your VolumeIssue class must properly implement equals and hashCode for this to work.

Since you've added the Guava tag, I'd go for a variation of Marco Topolnik's answer . Instead of removing one set from the other, use Sets.difference(left, right)

Returns an unmodifiable view of the difference of two sets. The returned set contains all elements that are contained by set1 and not contained by set2. set2 may also contain elements not present in set1; these are simply ignored. The iteration order of the returned set matches that of set1.

What data structure should I use to store the above values (volume and issue) in java?

You can have a HashMap's with key and value pairs.

key is Volume and Value is a List of Issues.

How should I implement this scenario in java in the most efficient manner to find the difference between these 2 copies?

By getting value from both the HashMap's so you get two List's of value. Then find the difference between those two lists.

consider you got two list of values with same key from two maps.

now

  Collection<Issue> diff =  list1.removeAll( list2 );

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM