I have a list with some elements and I want to remove elements from another list. An item should be removed if its value Contain
s (not equals) the value from another list.
One of the ways is to do this:
var MyList = new List<string> { ... }
var ToRemove = new List<string> { ... }
MyList.RemoveAll(_ => ToRemove.Any(_.Contains));
It works...
but, I have a LOT of lists (>1 million) and since the ToRemove can be sorted, it would make sense to use that in order to speed the process.
It's easy to make a loop that does it, but is there a way to do this with the sorted collections?
Update:
On 20k iterations on a text with our forbidden list, I get this:
Forbidden list as List -> 00:00:07.1993364
Forbidden list as HashSet -> 00:00:07.9749997
It's consistent after multiple runs, so the hashset is slower
Well, sorting ToRemove
may be beneficial because of binary search O(log n)
complexity (you will need to rewrite _ => ToRemove.Any(_.Contains)
).
But, instead, using a HashSet<string>
instead of List<string>
for ToRemove
will be much faster, because finding an element in a hashset (using Contains
) is O(1)
operation.
Also, using LinkedList<string>
for MyList
can potentially be beneficial, since removing an item from a linked list is generally faster than removing from an array based list because of array size adjusting.
Since this is a removal of strings that contain strings that are in another list, a HashSet wouldn't be much help. Actually not much would be unless you were looking for exact full matches or maintain an index of all substrings (expensive and AFIK only SQL Server does this semi-efficiently outside the BigData realm). If all you cared about was if it starts with items in 'ToRemove', sorting could help. Sort the 'MyList' and foreach string in 'ToRemove' custom binary search to find any string starting with that string and RemoveAt index until not starts with, then decrement index backwards removing until not starts with.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.