简体   繁体   中英

Performance issue with linq query

I have a function that gets a list of files from a directory, and then searches the file names for matches from a List. The performance sucks.

Here is the function:

public List<fileStatus> checkFilesStatus(List<string> permitNumbers, string serverDirectory, fileType type)
    {
        XmlConfigurator.Configure();
        log.Debug(string.Format("Beginning checkFilesStatus with following parameters > permitNumbers: {0} > serverDirectory: {1} > type: {2}", string.Join(",", permitNumbers.ToArray()), serverDirectory, type.ToString()));
        List<fileStatus> results = new List<fileStatus>();
        DirectoryInfo dirInfo = new DirectoryInfo(serverDirectory);
        if (dirInfo.Exists)
        {
            // GET LIST OF ALL FILES IN DIRECTORY
            string[] files = System.IO.Directory.GetFiles(serverDirectory, "*", System.IO.SearchOption.AllDirectories);

            log.Debug(string.Format("List of all files in directory: {0}", string.Join(",", files)));


            if (files.Length > 0 && permitNumbers.Count > 0)
            {
                log.Debug("Checking for matching files");
                // CHECK FOR MATCHING FILES
                switch (type)
                {
                    case fileType.Well:

                        var matchingFiles = (from f in files
                                             where f.Substring(f.LastIndexOf("\\") + 1).Length > 4
                                             where permitNumbers.Contains(f.Substring(f.LastIndexOf("\\") + 1, 5))
                                             select new fileStatus(fileType.Well, f.Substring(f.LastIndexOf("\\") + 1, 5), 1, f.Substring(f.LastIndexOf("\\") + 1)));


                        var permitNumbersWithMatches = (from x in matchingFiles
                                                       select x.PermitNumber);

                        var nonMatchingFiles = (from p in permitNumbers
                                                where !permitNumbersWithMatches.Contains(p)
                                                select new fileStatus(fileType.Well, p, 0, string.Empty));

                        results.AddRange(matchingFiles);
                        results.AddRange(nonMatchingFiles);

                        break;
                    case fileType.DrillerLog:
                        matchingFiles = (from f in files
                                         where f.Substring(f.LastIndexOf("\\") + 1).Length > 4
                                         where permitNumbers.Contains(f.Substring(f.LastIndexOf("\\") + 1, 5))
                                         select new fileStatus(fileType.DrillerLog, f.Substring(f.LastIndexOf("\\") + 1, 5), 1, f.Substring(f.LastIndexOf("\\") + 1)));

                        permitNumbersWithMatches = (from x in matchingFiles
                                                       select x.PermitNumber);

                        nonMatchingFiles = (from p in permitNumbers
                                                where !permitNumbersWithMatches.Contains(p)
                                            select new fileStatus(fileType.DrillerLog, p, 0, string.Empty));


                        results.AddRange(matchingFiles);
                        results.AddRange(nonMatchingFiles);

                        break;
                    case fileType.RasterLog:

                        matchingFiles = (from f in files
                                         where f.Substring(f.LastIndexOf("\\") + 1).Length > 13
                                         where permitNumbers.Contains(f.Substring(f.LastIndexOf("\\") + 1, 14))
                                         select new fileStatus(fileType.RasterLog, f.Substring(f.LastIndexOf("\\") + 1, 14), 1, f.Substring(f.LastIndexOf("\\") + 1)));

                        permitNumbersWithMatches = (from x in matchingFiles
                                                       select x.PermitNumber);

                        nonMatchingFiles = (from p in permitNumbers
                                                where !permitNumbersWithMatches.Contains(p)
                                            select new fileStatus(fileType.RasterLog, p, 0, string.Empty));



                        results.AddRange(matchingFiles);
                        results.AddRange(nonMatchingFiles);
                        break;
                    default:
                        break;
                }
                log.Debug("Done checking for matching files");
            }
        }
        return results;

    }

As soon as it gets to the linq query that provides the value for "matchingFiles", it just hangs. This is with a large-ish set of "permitNumbers" (like 5000) and also a large set of "files".

Is there anything I can do to speed this up?

Taking into account the suggestions provided below, I modified the function to be as follows and now the performance is working as expected. Thank you all very much! =)

public List<fileStatus> checkFilesStatus(List<string> permitNumbers, string serverDirectory, fileType type)
    {
        HashSet<string> numbers = new HashSet<string>(permitNumbers);
        XmlConfigurator.Configure();
        log.Debug(string.Format("Beginning checkFilesStatus with following parameters > permitNumbers: {0} > serverDirectory: {1} > type: {2}", string.Join(",", permitNumbers.ToArray()), serverDirectory, type.ToString()));
        List<fileStatus> results = new List<fileStatus>();
        DirectoryInfo dirInfo = new DirectoryInfo(serverDirectory);
        if (dirInfo.Exists)
        {
            // GET LIST OF ALL FILES IN DIRECTORY
            string[] files = System.IO.Directory.GetFiles(serverDirectory, "*", System.IO.SearchOption.AllDirectories);
            HashSet<string> fileNames = new HashSet<string>(files.Select(f => Path.GetFileName(f)));

            log.Debug(string.Format("List of all files in directory: {0}", string.Join(",", files)));


            if (fileNames.Count > 0 && numbers.Count > 0)
            {
                log.Debug("Checking for matching files");
                // CHECK FOR MATCHING FILES
                switch (type)
                {
                    case fileType.Well:
                        var matchingFiles = (from f in fileNames
                                             where f.Length > 4
                                             where numbers.Contains(f.Substring(0, 5))
                                             select new fileStatus(fileType.Well, f.Substring(0, 5), 1, f));


                        var permitNumbersWithMatches = (from x in matchingFiles
                                                       select x.PermitNumber);

                        var nonMatchingFiles = numbers.Except(permitNumbersWithMatches)
                            .Select(p => new fileStatus(fileType.Well, p, 0, string.Empty));

                        results.AddRange(matchingFiles);
                        results.AddRange(nonMatchingFiles);

                        break;
                    case fileType.DrillerLog:
                        matchingFiles = (from f in fileNames
                                         where f.Length > 4
                                         where numbers.Contains(f.Substring(0, 5))
                                         select new fileStatus(fileType.DrillerLog, f.Substring(0, 5), 1, f));


                        permitNumbersWithMatches = (from x in matchingFiles
                                                       select x.PermitNumber);

                        nonMatchingFiles = numbers.Except(permitNumbersWithMatches)
                            .Select(p => new fileStatus(fileType.DrillerLog, p, 0, string.Empty));


                        results.AddRange(matchingFiles);
                        results.AddRange(nonMatchingFiles);

                        break;
                    case fileType.RasterLog:

                        matchingFiles = (from f in fileNames
                                         where f.Length > 13
                                         where numbers.Contains(f.Substring(0, 14))
                                         select new fileStatus(fileType.RasterLog, f.Substring(0, 14), 1, f));

                        permitNumbersWithMatches = (from x in matchingFiles
                                                       select x.PermitNumber);

                        nonMatchingFiles = numbers.Except(permitNumbersWithMatches)
                            .Select(p => new fileStatus(fileType.RasterLog, p, 0, string.Empty));


                        results.AddRange(matchingFiles);
                        results.AddRange(nonMatchingFiles);
                        break;
                    default:
                        break;
                }
                log.Debug("Done checking for matching files");
            }
        }
        return results;

    }

You're creating a query , matchingFiles , which, when iterated, will iterate through all of the files that you have, manipulating them in several ways, and also doing a linear search of your set of numbers . You then take this query and execute it (requiring repeatedly reading a lot of data from disk, which is very expensive if you have enough that you thrash the cache) and perform a linear search of it for each of the permit numbers. This results in an asymptotic complexity of O(N^2 * M) where N is the number of permit numbers and M is the number of files. That's...very bad.

The key here is to avoid 1) doing linear searches and 2) iterating complex queries more than once, and in particular avoiding iterating them for each item in some other sequence.

For #1, just make permitNumbers a HashSet<string> rather than a list, then checking if an item is contained in it will become an O(1) operation.

For #2 replace your third query with an operation that only needs to iterate the source sequence once:

var nonMatchingFiles = permitNumbers.Except(permitNumbersWithMatches)
    .Select(p => new fileStatus(fileType.Well, p, 0, string.Empty));

I would eliminate all the repeated calls to f.Substring(f.LastIndexOf("\\") + 1)) with a single call to Path.GetFileName(f)

For example

var fileNames = files.Select(f => Path.GetFileName(f));    

var matchingFiles = (from fname in fileNames
                     where fname.Length > 4
                     where permitNumbers.Contains(fname.Substring(0, 5))
                     select new fileStatus(fileType.Well, fname.Substring(0, 5), 1, fname);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM