简体   繁体   中英

How to subtract one huge list from another efficiently in C#

I have a very long list of Ids (integers) that represents all the items that are currently in my database:

var idList = GetAllIds();

I also have another huge generic list with items to add to the database:

List<T> itemsToAdd;

Now, I would like to remove all items from the generic list whose Id is already in the idList. Currently idList is a simple array and I subtract the lists like this:

itemsToAdd.RemoveAll(e => idList.Contains(e.Id));

I am pretty sure that it could be a lot faster, so what datatypes should I use for both collections and what is the most efficient practice to subtract them?

Thank you!

LINQ could help:

itemsToAdd.Except(idList)

Your code is slow because List<T>.Contains is O(n) . So your total cost is O(itemsToAdd.Count*idList.Count) .

You can make idList into a HashSet<T> which has O(1) .Contains . Or just use the Linq .Except extension method which does it for you.

Note that .Except will also remove all duplicates from the left side. ie new int[]{1,1,2}.Except(new int[]{2}) will result in just {1} and the second 1 was removed. But I assume it's no problem in your case because IDs are typically unique.

Transform temporarily idList to an HashSet<T> and use the same method ie:

items.RemoveAll(e => idListHash.Contains(e.Id));

it should be much faster

Assuming the following premises are true:

  • idList and itemsToAdd may not contain duplicate values
  • you are using the .NET Framework 4.0

you could use a HashSet<T> this way:

var itemsToAddSet = new HashSet(itemsToAdd);
itemsToAddSet.ExceptWith(idList);

According to the documentation the ISet<T>.ExceptWith method is pretty efficient:

This method is an O(n) operation, where n is the number of elements in the other parameter.

In your case n is the number of items in idList .

You should use two HashSet<int> s.
Note that they're unique and unordered.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM