简体   繁体   中英

How to write these linq queries in the most optimal way

I have a list that contains thousands of strings. I want to remove all of the strings that contains certain strings and also select distinct and min length 4. Currently below queries working very well however I believe performance is not the best. Can they be written to perform more efficiently?

C# 5 .net 4.5.2 wpf application

 List<string> lstFoundUrls = new List<string>{"filled with strings"};//lets say 1000

public static List<string> lstBannedUrlExtensions = new List<string> { ".png", ".jpg", ".gif", ".pdf", ".jpeg", ".txt", ".doc", ".docx", ".ppt", ".pptx", ".css", ".js", ".ico" };

lstFoundUrls = lstFoundUrls.Where(pr => lstBannedUrlExtensions.Where(ar => pr.ToLowerInvariant().Contains(ar) == true ).Count<string>() == 0).ToList<string>();

lstFoundUrls = lstFoundUrls.Where(pr => pr != "null").Where(pr => pr.Length > 4).Distinct<string>().ToList<string>();

A few things.

Firstly, you are counting how many banned extensions each url contains. So if a url contains .png, you aren't stopping, and checking that url for .jpg, .gif etc.

Secondly, you're performing ToLowerInvariant on every string repeatedly, rather than just once per string. This means you're doing X * Y calls to ToLowerInvariant rather than just X.

Thirdly, you are throwing out all the strings that contain "null" and below minimum length after performing the costly checks. These checks should come first - throw away as many urls as possible as quickly as possible. (The "null" check is also redundant since you throw out any strings below length 5 anyway).

Fourthly, you are using .Distinct() rather than just using a HashSet to begin with.

Lastly, you are iterating over all file extensions for each url, rather than finding the file extension for each url and checking an O(1) collection (eg a Hashset ) to see if the extension is banned.

Something like this would be better:

        var foundUrls = new HashSet<string>(lstFoundUrls);
        var bannedExtensions = new HashSet<string> { "png", "jpg", "gif", "pdf", "jpeg", "txt", "doc", "docx", "ppt", "pptx", "css", "js", "ico" };

        var filteredUrls = foundUrls.Where(s => s.Length > 4);
        var foundUrlsWithExtension = filteredUrls.ToDictionary(url => url, url => Path.GetExtension(url.ToLowerInvariant()));

        var filteredUrls2 = foundUrlsWithExtension.Where(kvp => !bannedExtensions.Contains(kvp.Value));

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM