I have a very long list of Ids (integers) that represents all the items that are currently in my database:
var idList = GetAllIds();
I also have another huge generic list with items to add to the database:
List<T> itemsToAdd;
Now, I would like to remove all items from the generic list whose Id is already in the idList. Currently idList is a simple array and I subtract the lists like this:
itemsToAdd.RemoveAll(e => idList.Contains(e.Id));
I am pretty sure that it could be a lot faster, so what datatypes should I use for both collections and what is the most efficient practice to subtract them?
Thank you!
LINQ could help:
itemsToAdd.Except(idList)
Your code is slow because List<T>.Contains
is O(n)
. So your total cost is O(itemsToAdd.Count*idList.Count)
.
You can make idList into a HashSet<T>
which has O(1)
.Contains
. Or just use the Linq .Except
extension method which does it for you.
Note that .Except
will also remove all duplicates from the left side. ie new int[]{1,1,2}.Except(new int[]{2})
will result in just {1}
and the second 1 was removed. But I assume it's no problem in your case because IDs are typically unique.
Transform temporarily idList
to an HashSet<T>
and use the same method ie:
items.RemoveAll(e => idListHash.Contains(e.Id));
it should be much faster
Assuming the following premises are true:
idList
and itemsToAdd
may not contain duplicate values you could use a HashSet<T> this way:
var itemsToAddSet = new HashSet(itemsToAdd);
itemsToAddSet.ExceptWith(idList);
According to the documentation the ISet<T>.ExceptWith method is pretty efficient:
This method is an O(n) operation, where n is the number of elements in the other parameter.
In your case n
is the number of items in idList
.
You should use two HashSet<int>
s.
Note that they're unique and unordered.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.