I have a list that contains thousands of strings. I want to remove all of the strings that contains certain strings and also select distinct and min length 4. Currently below queries working very well however I believe performance is not the best. Can they be written to perform more efficiently?
C# 5 .net 4.5.2 wpf application
List<string> lstFoundUrls = new List<string>{"filled with strings"};//lets say 1000
public static List<string> lstBannedUrlExtensions = new List<string> { ".png", ".jpg", ".gif", ".pdf", ".jpeg", ".txt", ".doc", ".docx", ".ppt", ".pptx", ".css", ".js", ".ico" };
lstFoundUrls = lstFoundUrls.Where(pr => lstBannedUrlExtensions.Where(ar => pr.ToLowerInvariant().Contains(ar) == true ).Count<string>() == 0).ToList<string>();
lstFoundUrls = lstFoundUrls.Where(pr => pr != "null").Where(pr => pr.Length > 4).Distinct<string>().ToList<string>();
A few things.
Firstly, you are counting how many banned extensions each url contains. So if a url contains .png, you aren't stopping, and checking that url for .jpg, .gif etc.
Secondly, you're performing ToLowerInvariant
on every string repeatedly, rather than just once per string. This means you're doing X * Y calls to ToLowerInvariant
rather than just X.
Thirdly, you are throwing out all the strings that contain "null"
and below minimum length after performing the costly checks. These checks should come first - throw away as many urls as possible as quickly as possible. (The "null" check is also redundant since you throw out any strings below length 5 anyway).
Fourthly, you are using .Distinct()
rather than just using a HashSet
to begin with.
Lastly, you are iterating over all file extensions for each url, rather than finding the file extension for each url and checking an O(1) collection (eg a Hashset
) to see if the extension is banned.
Something like this would be better:
var foundUrls = new HashSet<string>(lstFoundUrls);
var bannedExtensions = new HashSet<string> { "png", "jpg", "gif", "pdf", "jpeg", "txt", "doc", "docx", "ppt", "pptx", "css", "js", "ico" };
var filteredUrls = foundUrls.Where(s => s.Length > 4);
var foundUrlsWithExtension = filteredUrls.ToDictionary(url => url, url => Path.GetExtension(url.ToLowerInvariant()));
var filteredUrls2 = foundUrlsWithExtension.Where(kvp => !bannedExtensions.Contains(kvp.Value));
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.