简体   繁体   English

检测两个目录之间文件差异的更好方法?

[英]Better way to detect file differences between 2 directories?

I made some C# functions to roughly "diff" 2 directories, similar to KDiff3.我制作了一些 C# 函数来大致“区分”2 个目录,类似于 KDiff3。

First this function compares file names between directories.首先这个 function 比较目录之间的文件名。 Any difference in file names implies a file has been added to dir1:文件名的任何差异都意味着文件已添加到 dir1:

public static List<string> diffFileNamesInDirs(string dir1, string dir2)
{
    List<string> dir1FileNames = Directory
       .EnumerateFiles(dir1, "*", SearchOption.AllDirectories)
       .Select(Path.GetFullPath)
       .Select(entry => entry.Replace(dir1 + "\\", "")
       .ToList();
    List<string> dir2FileNames = Directory
        .EnumerateFiles(dir2, "*", SearchOption.AllDirectories)
        .Select(Path.GetFullPath)
        .Select(entry => entry.Replace(dir2 + "\\", "")
        .ToList();
    List<string> diffs = dir1FileNames.Except(dir2FileNames).Distinct().ToList();

    return diffs;
}

Second this function compares file sizes for file names which exist in both directories.其次,这个 function 比较了两个目录中存在的文件名的文件大小。 Any difference in file size implies some edit has been made:文件大小的任何差异都意味着已经进行了一些编辑:

public static List<string> diffFileSizesInDirs(string dir1, string dir2)
{
    //Get list of file paths, relative to the base dir1/dir2 directories
    List<string> dir1FileNames = Directory
       .EnumerateFiles(dir1, "*", SearchOption.AllDirectories)
       .Select(Path.GetFullPath)
       .Select(entry => entry.Replace(dir1 + "\\", "")
       .ToList();
    List<string> dir2FileNames = Directory
        .EnumerateFiles(dir2, "*", SearchOption.AllDirectories)
        .Select(Path.GetFullPath)
        .Select(entry => entry.Replace(dir2 + "\\", "")
        .ToList();
    List<string> sharedFileNames = dir1FileNames.Intersect(dir2FileNames).Distinct().ToList();

    //Get list of file sizes corresponding to file paths
    List<long> dir1FileSizes = sharedFileNames
        .Select(s => 
        new FileInfo(dir1 + "\\" + s) //Create the full file path as required for FileInfo objects
        .Length).ToList();
    List<long> dir2FileSizes = sharedFileNames
        .Select(s =>
        new FileInfo(dir2 + "\\" + s) //Create the full file path as required for FileInfo objects
        .Length).ToList();

    List<string> changedFiles = new List<string>();
    for (int i = 0; i < sharedFileNames.Count; i++)
    {
        //If file sizes are different, there must have been a change made to one of the files. 
        if (dir1FileSizes[i] != dir2FileSizes[i])
        {
            changedFiles.Add(sharedFileNames[i]);
        }
    }

    return changedFiles;
}

Lastly combining the results gives a list of all files which have been added/edited between the directories:最后结合结果给出了在目录之间添加/编辑的所有文件的列表:

List<string> nameDiffs = FileIO.diffFileNamesInDirs(dir1, dir2);
List<string> sizeDiffs = FileIO.diffFileSizesInDirs(dir1, dir2);
List<string> allDiffs = nameDiffs.Concat(sizeDiffs).ToList();

This approach generally works but feels sloppy and also would fail for the "binary equal" case where a file is modified but still has the same size.这种方法通常有效,但感觉草率,并且对于文件被修改但仍具有相同大小的“二进制相等”情况也会失败。 Any suggestions on a better way?关于更好的方法有什么建议吗?

You could use System.Security.Cryptographie.MD5 to calculate MD5 for each file and compare these.您可以使用System.Security.Cryptographie.MD5计算每个文件的 MD5 并进行比较。

Eg using this Method:例如使用这种方法:

public static string GetMd5Hash(string path)
{
    using (var md5 = MD5.Create())
    {
        using (var stream = File.OpenRead(path))
        {
            var hash = md5.ComputeHash(stream);
            return BitConverter.ToString(hash).Replace("-", "").ToLowerInvariant();
        }
    }
}

Maybe this takes a little bit more time than geting values from FileInfo (depends on the amount of file to compare), but you can be completely sure if files are binary identical.也许这比从 FileInfo 获取值要花费更多的时间(取决于要比较的文件的数量),但是您可以完全确定文件是否是二进制相同的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM