简体   繁体   中英

Removing list item from another list

I have a list with some elements and I want to remove elements from another list. An item should be removed if its value Contain s (not equals) the value from another list.

One of the ways is to do this:

var MyList = new List<string> { ... }
var ToRemove = new List<string> { ... }
MyList.RemoveAll(_ => ToRemove.Any(_.Contains));

It works...

but, I have a LOT of lists (>1 million) and since the ToRemove can be sorted, it would make sense to use that in order to speed the process.

It's easy to make a loop that does it, but is there a way to do this with the sorted collections?


Update:

On 20k iterations on a text with our forbidden list, I get this:

Forbidden list as List -> 00:00:07.1993364

Forbidden list as HashSet -> 00:00:07.9749997

It's consistent after multiple runs, so the hashset is slower

Well, sorting ToRemove may be beneficial because of binary search O(log n) complexity (you will need to rewrite _ => ToRemove.Any(_.Contains) ).

But, instead, using a HashSet<string> instead of List<string> for ToRemove will be much faster, because finding an element in a hashset (using Contains ) is O(1) operation.

Also, using LinkedList<string> for MyList can potentially be beneficial, since removing an item from a linked list is generally faster than removing from an array based list because of array size adjusting.

Since this is a removal of strings that contain strings that are in another list, a HashSet wouldn't be much help. Actually not much would be unless you were looking for exact full matches or maintain an index of all substrings (expensive and AFIK only SQL Server does this semi-efficiently outside the BigData realm). If all you cared about was if it starts with items in 'ToRemove', sorting could help. Sort the 'MyList' and foreach string in 'ToRemove' custom binary search to find any string starting with that string and RemoveAt index until not starts with, then decrement index backwards removing until not starts with.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM