How could I speed up this linq query?
It takes a long time and when I place a lot of objects in the list I get a memory exception.
List<DirectoryInfo> directoriesThatWillBeCreated = new List<DirectoryInfo>();
// some code to fill the list
// ..
// ..
List<FileInfo> FilesThatWillBeCopied = new List<FileInfo>();
// some code to fill the list
//....
directoriesThatWillBeCreated = (from a in FilesThatWillBeCopied
from b in directoriesThatWillBeCreated
where a.FullName.Contains(b.FullName)
select b).ToList();
I hope I can do something like previous solution but I don't know how to do that when dealing with different types of objects. Do I have to create a new class then convert all the FileInfo and DirectoryInfo objects to that class then perform the query? Moreover FileInfo and DirectoryInfo classes are sealed and I cannot inherit from them therefore I'll have to create a new class and that will be not to efficient. At least that will be more efficient than that query because that query takes forever.
It's slow because the code does linear search in directory list for each file. Try this:
var dirlist = FilesThatWillBeCopied
.Select(f => Directory.GetParent(f.FullName))
.GroupBy(d => d.FullName)
You may need to play with the syntax a little bit but hopefully you see the point.
One thing you could do is change the Contains to a StartsWith . StartsWith
will fail faster in the event of a failed match.
directoriesThatWillBeCreated = (from a in FilesThatWillBeCopied
from b in directoriesThatWillBeCreated
where a.FullName.StartsWith(b.FullName)
select b).ToList();
This isn't a complete solution, though. If FilesThatWillBeCopied
has M items and directoriesThatWillBeCreated
has N elements, then your query is going to process MxN string comparisons.
Another optimization to try, iterate through directoriesThatWillBeCreated
first, then select those that match any FileInfo
in FilesThatWillBeCopied
. By checking if any match, you could break out of testing the files once a match is found. That could be done like this: (warning, notepad code follows)
directoriesThatWillBeCreated = directoryThatWillBeCreated
.Select(b => FilesThatWillBeCopied
.Any(a => a.FullName.StartsWith(b.FullName)));
I would suggest using HashSet<DirectoryInfo>
for comparisons, but unfortunately, DirectoryInfo
doesn't have proper equality comparisons implemented, so strings will have to do. (Another option would be to implement your own IComparer<DirectoryInfo>
.) Also, you should use StringComparer.InvariantCultureIgnoreCase
on the names unless you are sure that both collections have the same case.
var dirs = new HashSet<string>(StringComparer.InvariantCultureIgnoreCase);
// fill dirs
var files = new List<FileInfo>();
// fill files
var result = new HashSet<string>(StringComparer.InvariantCultureIgnoreCase);
foreach (var file in files)
{
var dir = file.Directory;
while (dir != null && !result.Contains(dir.FullName))
{
if (dirs.Contains(dir.FullName))
result.Add(dir.FullName);
dir = dir.Parent;
}
}
This solution doesn't use LINQ at all, but that's often the case when you're after performance and the most straight-forward LINQ solution is too slow.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.