简体   繁体   中英

Compare two lists that contain a lot of objects (3th part) “those objects have different type”

How could I speed up this linq query?

It takes a long time and when I place a lot of objects in the list I get a memory exception.

List<DirectoryInfo> directoriesThatWillBeCreated = new List<DirectoryInfo>();
// some code to fill the list
// ..
// ..

List<FileInfo> FilesThatWillBeCopied = new List<FileInfo>();
// some code to fill the list
//....

directoriesThatWillBeCreated = (from a in FilesThatWillBeCopied
                                from b in directoriesThatWillBeCreated
                                where a.FullName.Contains(b.FullName)
                                select b).ToList();

I hope I can do something like previous solution but I don't know how to do that when dealing with different types of objects. Do I have to create a new class then convert all the FileInfo and DirectoryInfo objects to that class then perform the query? Moreover FileInfo and DirectoryInfo classes are sealed and I cannot inherit from them therefore I'll have to create a new class and that will be not to efficient. At least that will be more efficient than that query because that query takes forever.

It's slow because the code does linear search in directory list for each file. Try this:

var dirlist = FilesThatWillBeCopied
    .Select(f => Directory.GetParent(f.FullName))
    .GroupBy(d => d.FullName)

You may need to play with the syntax a little bit but hopefully you see the point.

One thing you could do is change the Contains to a StartsWith . StartsWith will fail faster in the event of a failed match.

directoriesThatWillBeCreated = (from a in FilesThatWillBeCopied
                                from b in directoriesThatWillBeCreated
                                where a.FullName.StartsWith(b.FullName)
                                select b).ToList();

This isn't a complete solution, though. If FilesThatWillBeCopied has M items and directoriesThatWillBeCreated has N elements, then your query is going to process MxN string comparisons.

Another Option

Another optimization to try, iterate through directoriesThatWillBeCreated first, then select those that match any FileInfo in FilesThatWillBeCopied . By checking if any match, you could break out of testing the files once a match is found. That could be done like this: (warning, notepad code follows)

directoriesThatWillBeCreated = directoryThatWillBeCreated
    .Select(b => FilesThatWillBeCopied
    .Any(a => a.FullName.StartsWith(b.FullName)));

I would suggest using HashSet<DirectoryInfo> for comparisons, but unfortunately, DirectoryInfo doesn't have proper equality comparisons implemented, so strings will have to do. (Another option would be to implement your own IComparer<DirectoryInfo> .) Also, you should use StringComparer.InvariantCultureIgnoreCase on the names unless you are sure that both collections have the same case.

var dirs = new HashSet<string>(StringComparer.InvariantCultureIgnoreCase);
// fill dirs

var files = new List<FileInfo>();
// fill files

var result = new HashSet<string>(StringComparer.InvariantCultureIgnoreCase);

foreach (var file in files)
{
    var dir = file.Directory;
    while (dir != null && !result.Contains(dir.FullName))
    {
        if (dirs.Contains(dir.FullName))
            result.Add(dir.FullName);
        dir = dir.Parent;
    }
}

This solution doesn't use LINQ at all, but that's often the case when you're after performance and the most straight-forward LINQ solution is too slow.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM