[英]Returning Dictionary<FileHash, string[]> from Linq Query
Thanks in advance for any assistance. 在此先感谢您的协助。 I'm not even sure if this is possible, but I'm trying to get a list of duplicate files using their hashes to identify the list of files associated with the hashes.
我什至不确定这是否可行,但是我试图使用其哈希值来获取重复文件的列表,以标识与哈希值关联的文件列表。
I have this below: 我下面有这个:
Dictionary<FileHash, string[]> FindDuplicateFiles(string searchFolder)
{
Directory.GetFiles(searchFolder, "*.*")
.Select(
f => new
{
FileName = f,
FileHash = Encoding.UTF8.GetString(new SHA1Managed()
.ComputeHash(new FileStream(f,
FileMode.
OpenOrCreate,
FileAccess.Read)))
})
.GroupBy(f => f.FileHash)
.Select(g => new
{
FileHash = g.Key,
Files = g.Select(z => z.FileName).ToList()
})
.GroupBy(f => f.FileHash)
.Select(g => new {FileHash = g.Key, Files = g.Select(z => z.Files).ToArray()});
It compiles fine, but I'm just curious whether there's even a way to manipulate the results to return a Dictionary. 它可以很好地编译,但是我很好奇是否还有一种方法可以操纵结果以返回字典。
Any suggestions, alternatives, critiques would be greatly appreciated. 任何建议,替代方案,批评将不胜感激。
Create an extension method to IEnumerable<_> called toDictionary which converts a sequence of key value pairs to dictionary. 创建一个名为toDictionary的IEnumerable <_>扩展方法,该方法将一系列键值对转换为字典。 Might raise exception on duplicate keys.
可能在重复键上引发异常。
Why do you need the second GroupBy? 为什么需要第二个GroupBy?
You can use Enumerable.ToDictionary to collect a LINQ query into a dictionary: 您可以使用Enumerable.ToDictionary将LINQ查询收集到字典中:
var sha1 = new SHA1Managed();
Dictionary<string, string[]> result =
Directory
.EnumerateFiles(searchFolder)
.GroupBy(file => Convert.ToBase64String(sha1.ComputeHash(...)))
.ToDictionary(g => g.Key, g => g.ToArray());
Some remarks: 一些说明:
There's already an extension method which will do this. 已经有一个扩展方法可以做到这一点。 Just stick this at the end of your existing query:
只需将其放在现有查询的末尾即可:
.ToDictionary(x => x.FileHash, x => x.Files);
However: using Encoding.UTF8.GetString
to convert arbitrary binary data into a string is a really bad idea. 但是,使用
Encoding.UTF8.GetString
将任意二进制数据转换为字符串是一个非常糟糕的主意。 Use Convert.ToBase64String
instead. 请改用
Convert.ToBase64String
。 The hash is not a UTF-8 encoded string, so don't treat it as one. 哈希不是 UTF-8编码的字符串,因此请勿将其视为一个。
You're also grouping by hash twice, which I suspect isn't really what you want to do. 您还将按哈希分组两次,我怀疑这并不是您真正想做的。
Alternatively, remove the previous GroupBy
calls and use a Lookup
instead: 或者,删除以前的
GroupBy
调用并改用Lookup
:
var query = Directory.GetFiles(searchFolder, "*.*")
.Select(f => new {
FileName = f,
FileHash = Convert.ToBase64String(
new SHA1Managed().ComputeHash(...))
})
.ToLookup(x => x.FileHash, x => x.FileName);
That will give you a Lookup<string, string>
, which is basically the files grouped by hash. 这将为您提供
Lookup<string, string>
,基本上是按哈希分组的文件。
One further thing to note: I suspect you'll be leaving file streams open with this method. 需要注意的另一件事:我怀疑您将使用这种方法打开文件流。 I suggest you write a small separate method to compute the hash of a file based on its name, but making sure you close the stream (with a
using
statement in the normal way). 我建议您编写一个单独的小方法来根据文件名计算文件的哈希值,但要确保关闭流(
using
常规方式的using
语句)。 This will also end up making your query simpler - something along the lines of: 这也将最终使您的查询更简单-类似以下内容:
var query = Directory.GetFiles(searchFolder)
.ToLookup(x => ComputeHash(x));
It's hard to simplify it much further than that :) 很难进一步简化它了:)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.