How to subtract one huge list from another efficiently in C#

Question

I have a very long list of Ids (integers) that represents all the items that are currently in my database:

var idList = GetAllIds();

I also have another huge generic list with items to add to the database:

List<T> itemsToAdd;

Now, I would like to remove all items from the generic list whose Id is already in the idList. Currently idList is a simple array and I subtract the lists like this:

itemsToAdd.RemoveAll(e => idList.Contains(e.Id));

I am pretty sure that it could be a lot faster, so what datatypes should I use for both collections and what is the most efficient practice to subtract them?

Thank you!

Answer 1

LINQ could help:

itemsToAdd.Except(idList)

Your code is slow because List<T>.Contains is O(n) . So your total cost is O(itemsToAdd.Count*idList.Count) .

You can make idList into a HashSet<T> which has O(1) .Contains . Or just use the Linq .Except extension method which does it for you.

Note that .Except will also remove all duplicates from the left side. ie new int[]{1,1,2}.Except(new int[]{2}) will result in just {1} and the second 1 was removed. But I assume it's no problem in your case because IDs are typically unique.

Answer 2

Transform temporarily idList to an HashSet<T> and use the same method ie:

items.RemoveAll(e => idListHash.Contains(e.Id));

it should be much faster

Answer 3

Assuming the following premises are true:

idList and itemsToAdd may not contain duplicate values
you are using the .NET Framework 4.0

you could use a HashSet<T> this way:

var itemsToAddSet = new HashSet(itemsToAdd);
itemsToAddSet.ExceptWith(idList);

According to the documentation the ISet<T>.ExceptWith method is pretty efficient:

This method is an O(n) operation, where n is the number of elements in the other parameter.

In your case n is the number of items in idList .

Answer 4

You should use two HashSet<int> s.
Note that they're unique and unordered.

How to subtract one huge list from another efficiently in C#

Question

4 answers

solution1
23 2011-02-23 14:06:29

solution2
18 ACCPTED 2011-02-23 14:07:19

solution3
5 2011-02-23 14:20:26

solution4
2 2011-02-23 14:06:10

How to subtract one huge list from another efficiently in C#

Question

4 answers

solution1 23 2011-02-23 14:06:29

solution2 18 ACCPTED 2011-02-23 14:07:19

solution3 5 2011-02-23 14:20:26

solution4 2 2011-02-23 14:06:10

solution1
23 2011-02-23 14:06:29

solution2
18 ACCPTED 2011-02-23 14:07:19

solution3
5 2011-02-23 14:20:26

solution4
2 2011-02-23 14:06:10