简体   繁体   English

计算文件的 MD5 校验和

[英]Calculate MD5 checksum for a file

I'm using iTextSharp to read the text from a PDF file.我正在使用iTextSharp从 PDF 文件中读取文本。 However, there are times I cannot extract text, because the PDF file is only containing images.但是,有时我无法提取文本,因为 PDF 文件仅包含图像。 I download the same PDF files everyday, and I want to see if the PDF has been modified.我每天都下载同样的PDF文件,想看看PDF有没有被修改过。 If the text and modification date cannot be obtained, is a MD5 checksum the most reliable way to tell if the file has changed?如果无法获得文本和修改日期, MD5校验和是否是判断文件是否已更改的最可靠方法?

If it is, some code samples would be appreciated, because I don't have much experience with cryptography.如果是,一些代码示例将不胜感激,因为我没有太多的密码学经验。

It's very simple using System.Security.Cryptography.MD5 :使用System.Security.Cryptography.MD5非常简单:

using (var md5 = MD5.Create())
{
    using (var stream = File.OpenRead(filename))
    {
        return md5.ComputeHash(stream);
    }
}

(I believe that actually the MD5 implementation used doesn't need to be disposed, but I'd probably still do so anyway.) (我相信实际上使用的 MD5 实现不需要被处理,但我可能仍然会这样做。)

How you compare the results afterwards is up to you;之后如何比较结果取决于您; you can convert the byte array to base64 for example, or compare the bytes directly.例如,您可以将字节数组转换为 base64,或直接比较字节。 (Just be aware that arrays don't override Equals . Using base64 is simpler to get right, but slightly less efficient if you're really only interested in comparing the hashes.) (请注意,数组不会覆盖Equals 。使用 base64 更容易正确,但如果您真的只对比较哈希感兴趣,则效率会稍低。)

If you need to represent the hash as a string, you could convert it to hex using BitConverter :如果您需要将哈希表示为字符串,可以使用BitConverter将其转换为十六进制:

static string CalculateMD5(string filename)
{
    using (var md5 = MD5.Create())
    {
        using (var stream = File.OpenRead(filename))
        {
            var hash = md5.ComputeHash(stream);
            return BitConverter.ToString(hash).Replace("-", "").ToLowerInvariant();
        }
    }
}

This is how I do it:我就是这样做的:

using System.IO;
using System.Security.Cryptography;

public string checkMD5(string filename)
{
    using (var md5 = MD5.Create())
    {
        using (var stream = File.OpenRead(filename))
        {
            return Encoding.Default.GetString(md5.ComputeHash(stream));
        }
    }
}

I know this question was already answered, but this is what I use:我知道这个问题已经回答了,但这就是我使用的:

using (FileStream fStream = File.OpenRead(filename)) {
    return GetHash<MD5>(fStream)
}

Where GetHash :哪里GetHash

public static String GetHash<T>(Stream stream) where T : HashAlgorithm {
    StringBuilder sb = new StringBuilder();

    MethodInfo create = typeof(T).GetMethod("Create", new Type[] {});
    using (T crypt = (T) create.Invoke(null, null)) {
        byte[] hashBytes = crypt.ComputeHash(stream);
        foreach (byte bt in hashBytes) {
            sb.Append(bt.ToString("x2"));
        }
    }
    return sb.ToString();
}

Probably not the best way, but it can be handy.可能不是最好的方法,但它可以很方便。

I know that I am late to party but performed test before actually implement the solution.我知道我迟到了,但在实际实施解决方案之前进行了测试。

I did perform test against inbuilt MD5 class and also md5sum.exe .我确实对内置的 MD5 类和md5sum.exe进行了测试。 In my case inbuilt class took 13 second where md5sum.exe too around 16-18 seconds in every run.在我的情况下,内置类需要 13 秒,而 md5sum.exe 每次运行也需要 16-18 秒左右。

    DateTime current = DateTime.Now;
    string file = @"C:\text.iso";//It's 2.5 Gb file
    string output;
    using (var md5 = MD5.Create())
    {
        using (var stream = File.OpenRead(file))
        {
            byte[] checksum = md5.ComputeHash(stream);
            output = BitConverter.ToString(checksum).Replace("-", String.Empty).ToLower();
            Console.WriteLine("Total seconds : " + (DateTime.Now - current).TotalSeconds.ToString() + " " + output);
        }
    }

Here is a slightly simpler version that I found.这是我发现的一个稍微简单的版本。 It reads the entire file in one go and only requires a single using directive.它一次读取整个文件,只需要一个using指令。

byte[] ComputeHash(string filePath)
{
    using (var md5 = MD5.Create())
    {
        return md5.ComputeHash(File.ReadAllBytes(filePath));
    }
}

如果您需要计算 MD5 以查看它是否与 Azure blob 的 MD5 匹配,那么这个问题和答案可能会有所帮助: Azure 上上传的 blob 的 MD5 哈希与本地计算机上的相同文件不匹配

For dynamically-generated PDFs.对于动态生成的 PDF。 The creation date and modified dates will always be different.创建日期和修改日期总是不同的。

You have to remove them or set them to a constant value.您必须删除它们或将它们设置为恒定值。

Then generate md5 hash to compare hashes.然后生成 md5 哈希来比较哈希。

You can use PDFStamper to remove or update dates.您可以使用PDFStamper删除或更新日期。

In addition to the methods answered above if you're comparing PDFs you need to amend the creation and modified dates or the hashes won't match.除了上面回答的方法之外,如果您要比较 PDF,您还需要修改创建日期和修改日期,否则哈希值将不匹配。

For PDFs generated with QuestPdf youll need to override the CreationDate and ModifiedDate in the Document Metadata .对于使用 QuestPdf 生成的 PDF,您需要覆盖Document Metadata中的CreationDateModifiedDate

public class PdfDocument : IDocument
{
    ...

    DocumentMetadata GetMetadata()
    {
        return new()
        {
            CreationDate = DateTime.MinValue,
            ModifiedDate = DateTime.MinValue,
        };
    }
    
    ...
}

https://www.questpdf.com/concepts/document-metadata.html https://www.questpdf.com/concepts/document-metadata.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM