XML文件在C中的MD5实现

Question

I need to implement the MD5 checksum to verify a MD5 checksum in a XML file including all XML tags and which has received from our client. 我需要实现MD5校验和以验证XML文件中的MD5校验和，该文件包括所有XML标记，并且已从客户端接收到。 The length of the received MD5 checksum is 32 byte hexadecimal digits. 接收到的MD5校验和的长度为32个字节的十六进制数字。

We need set MD5 Checksum field should be 0 in received XML file prior to checksum calculation and we have to indepandantly calculate and verify the MD5 checksum value in a received XML file. 在计算校验和之前，我们需要在接收的XML文件中将MD5校验和字段设置为0，并且我们必须独立地计算和验证接收的XML文件中的MD5校验和值。

Our application is implemented in C. Please assist me on how to implement this. 我们的应用程序是用C实现的。请协助我实现该方法。

Thanks 谢谢

Answer 1

This directly depends on the library used for XML parsing. 这直接取决于用于XML解析的库。 This is tricky however, because you can't embed the MD5 in the XML file itself, for after embedding the checksum inside, unless you do the checksum only from the specific elements. 但是，这很棘手，因为您无法将MD5嵌入XML文件本身，因为在将校验和嵌入内部之后，除非仅对特定元素进行校验和。 As I understand you receive the MD5 independently? 据我了解，您是独立接收MD5的吗？ Is it calculated from the whole file, or only the tags/content? 它是根据整个文件还是仅根据标签/内容计算的？

MD5 Public Domain code link - http://www.fourmilab.ch/md5/ MD5公共域代码链接-http: //www.fourmilab.ch/md5/
XML library for C - http://xmlsoft.org/ C语言的XML库-http://xmlsoft.org/

Exact solutions depend on the code used. 确切的解决方案取决于所使用的代码。

Based on your comment you need to do the following steps: 根据您的评论，您需要执行以下步骤：

load the xml file (possibly even as plain-text) read the MD5 加载xml文件（可能甚至是纯文本），读取MD5
substitute the MD5 in the file with zero, write the file down (or better to memory) 将文件中的MD5替换为零，将文件写下（或更好地写入内存）
run MD5 on the pure file data and compare it with the value stored before 在纯文件数据上运行MD5并将其与之前存储的值进行比较

Answer 2

There are public-domain implementations of MD5 that you should use, instead of writing your own. 您应该使用MD5的公共领域实现，而不是编写自己的实现。 I hear that Colin Plumb's version is widely used. 听说Colin Plumb的版本被广泛使用。

Answer 3

Don't reinvent the wheel, use a proven existing solution: http://userpages.umbc.edu/~mabzug1/cs/md5/md5.html 不要重新发明轮子，请使用经过验证的现有解决方案： http : //userpages.umbc.edu/~mabzug1/cs/md5/md5.html

Incidentally that was the first link that came up when I googled "md5 c implementation" . 顺便说一句，这是我在Google搜索“ md5 c实现”时出现的第一个链接。

Answer 4

This is rather nasty. 真讨厌 The approach suggested seems to imply you need to parse the XML document into something like a DOM tree, find the MD5 checksum and store it for future reference. 建议的方法似乎暗示您需要将XML文档解析为DOM树之类的内容，找到MD5校验和并将其存储以备将来参考。 Then you would replace the checksum with 0 before re-serializing the document and calculating it's MD5 hash. 然后，您可以在重新序列化文档并计算其MD5哈希值之前将校验和替换为0。 This all sounds doable but potentially tricky. 所有这些听起来都是可行的，但可能很棘手。 The major difficulty I see is that your new serialization of the document may not be the same as the original one and irrelevant (to XML) differences like the use of single or double quotes around attribute values, added line breaks or even a different encoding will cause the hashs to differ. 我看到的主要困难是，文档的新序列化可能与原始序列不同，并且（与XML）无关（例如，在属性值周围使用单引号或双引号，添加了换行符甚至是不同的编码）导致哈希值不同。 If you go down this route you'll need to make sure your app and the procedure used to create the document in the first place make the same choices. 如果您采用这种方式，则首先需要确保您的应用程序和用于创建文档的过程都做出了相同的选择。 For this sort of problem canonical XML is the standard solution ( http://www.w3.org/TR/xml-c14n ). 对于此类问题，规范的XML是标准解决方案（ http://www.w3.org/TR/xml-c14n ）。

However, I would do something different. 但是，我会做一些不同的事情。 With any luck it should be quite easy to write a regular expression to locate the MD5 hash in the file and replace it with 0. You can then use this to grab the hash and replace with 0 it in the XML file before recalculating the hash. 运气好的话，编写正则表达式以在文件中定位MD5哈希并将其替换为0应该非常容易。在重新计算哈希之前，可以使用它来获取哈希并在XML文件中将其替换为0。 This sidesteps all the possible issues with parsing, changing and re-serializing the XML document. 这回避了解析，更改和重新序列化XML文档的所有可能的问题。 To illustrate I'm going to assume the hash '33d4046bea07e89134aecfcaf7e73015' lives in the XML file like this: 为了说明这一点，我将假设哈希“ 33d4046bea07e89134aecfcaf7e73015”位于XML文件中，如下所示：

<docRoot xmlns='some-irrelevant-uri>
  <myData>Blar blar</myData>
  <myExtraData number='1'/>
  <docHash MD5='33d4046bea07e89134aecfcaf7e73015' />
  <evenMoreOfMyData number='34'/>
</docRoot>

(which I've called hash.xml), that the MD5 should be replaced by 32 zeros (so the hash is correct) and illustrate the procedure on a shell command line using perl, md5 and bash. （我将其称为hash.xml），应将MD5替换为32个零（因此哈希是正确的），并在shell命令行上使用perl，md5和bash演示了该过程。 (Hopefully translating this into C won't be too hard given the existence of regular expression and hashing libraries.) （鉴于存在正则表达式和哈希库，希望将其转换为C不会太困难。）

Breaking down the problem, you first need to be able to find the hash that is in the file: 要解决该问题，首先需要能够找到文件中的哈希：

perl -p -e'if (m#<docHash.+MD5="([a-fA-F0-9]{32})#) {$_ = "$1\n"} else {$_ = ""}' hash.xml

(this works by looking for the start of the MD5 attribute of the docHash element, allowing for possible other attributes, and then grabbing the next 32 hex characters. If it finds them it bungs them in the magic $_ variable, if not it sets $_ to be empty, then the value of $_ gets printed for each line. This results in the string "33d4046bea07e89134aecfcaf7e73015" being printed.) （这是通过查找docHash元素的MD5属性的开头，允许其他可能的属性，然后获取接下来的32个十六进制字符来工作的。如果找到它们，它将在$ _变量中使它们变笨，如果没有设置的话） $ _为空，然后每行打印$ _的值。这将导致打印字符串“ 33d4046bea07e89134aecfcaf7e73015”。）

Then you need to calculate the hash of the the file with the has replaced with zeros: 然后，您需要使用替换为零的文件来计算文件的哈希值：

perl -p -e's#(<docHash.+MD5=)"([a-fA-F0-9]{32})#$1"000000000000000000000000000000#' hash.xml | md5

(where the regular expression is almost the same, but this time the hex characters are replaced by zeros and the whole file is printed. Then the MD5 of this is calculated by piping the result through an md5 hashing program. Putting this together with a bit of bash gives: （其中的正则表达式几乎相同，但是这一次十六进制字符被零替换，并且打印了整个文件。然后，通过将结果通过md5哈希程序传递给管道，来计算其MD5。 bash给出：

if [ `perl -p -e'if (m#<docHash.+MD5="([a-fA-F0-9]{32})#) {$_ = "$1\n"} else {$_ = ""}' hash.xml` = `perl -p -e's#(<docHash.+MD5=)"([a-fA-F0-9]{32})#$1"000000000000000000000000000000#' hash.xml | md5` ] ; then echo OK; else echo ERROR; fi

which executes those two small commands, compares the output and prints "OK" if the outputs match or "ERROR" if they don't. 它执行这两个小命令，比较输出，如果输出匹配，则输出“ OK”，否则输出“ ERROR”。 Obviously this is just a simple prototype, and is in the wrong language, I think it illustrates the most straight forward solution. 显然，这只是一个简单的原型，并且使用错误的语言，我认为它说明了最直接的解决方案。

Incidentally, why do you put the hash inside the XML document? 顺便说一句，为什么将哈希放在XML文档中？ As far as I can see it doesn't have any advantage compared to passing the hash along on a side channel (even something as simple as in a second file called documentname.md5) and makes the hash validation more difficult. 据我所知，与在边通道中传递哈希值相比，它没有任何优势（即使是像名为documentname.md5的第二个文件中那样简单的东西），也使哈希验证更加困难。

Answer 5

Check out these examples for how to use the XMLDSIG standard with .net 查看这些示例，了解如何在.net中使用XMLDSIG标准。

You should maybe consider to change the setting for preserving whitespaces. 您也许应该考虑更改保留空白的设置。

XML文件在C中的MD5实现

问题描述

5 个解决方案

解决方案1
4 已采纳 2010-01-28 14:05:16

解决方案2
1 2010-01-28 14:04:48

解决方案3
1 2010-01-28 14:06:54

解决方案4
0 2010-01-30 15:49:56

解决方案5
0 2011-09-08 13:26:26

XML文件在C中的MD5实现

问题描述

5 个解决方案

解决方案1 4 已采纳 2010-01-28 14:05:16

解决方案2 1 2010-01-28 14:04:48

解决方案3 1 2010-01-28 14:06:54

解决方案4 0 2010-01-30 15:49:56

解决方案5 0 2011-09-08 13:26:26

解决方案1
4 已采纳 2010-01-28 14:05:16

解决方案2
1 2010-01-28 14:04:48

解决方案3
1 2010-01-28 14:06:54

解决方案4
0 2010-01-30 15:49:56

解决方案5
0 2011-09-08 13:26:26