简体繁体 English

是否有可能在对象的二进制序列化（二进制）期间保存对象的哈希码？

[英]Is there a chance of saving the hashcode of an object during its binary serialization (binary)?

原文 2011-03-31 18:58:24 1 3 c#/ serialization/ binary-serialization

I want to be able to compare objects by the hashcode. 我希望能够通过哈希码比较对象。

Per example, one is the object itself, and the other is serialized (binary) and then recovered version of the object. 例如，一个是对象本身，另一个是序列化（二进制）然后恢复对象的版本。

How can I save the hash in the serialized (binary) object? 如何将哈希保存在序列化（二进制）对象中？

3 个解决方案

Why would you have to serialize the hash code? 为什么要序列化哈希码？ Instead you should provide a proper implementation of GetHashCode() and Equals() in your object that allows you to compare two objects based on their values - if two objects are equal their hash codes have to match. 相反，您应该在对象中提供GetHashCode()和Equals()的正确实现，该实现允许您根据两个对象的值比较两个对象-如果两个对象相等，则它们的哈希码必须匹配。 So once you have deserialized the object, you can use GetHashCode() on it and compare it with the other object. 因此，一旦反序列化了该对象，就可以在其上使用GetHashCode()并将其与另一个对象进行比较。 Note that the fact that two hash codes match is not enough to determine equality, they might still be different - you will have to call a proper implementation of Equals() to determine equality. 请注意，两个哈希码匹配的事实不足以确定相等性，但它们可能仍然不同-您必须调用Equals()的适当实现来确定相等性。

If you just want to compare custom fields within an object and a full comparison might be too expensive (ie a large binary array) it might make sense to generate an MD5 hash (ie with MD5CryptoServiceProvider.ComputeHash() ) on the field and store that within the object itself, it will then be serialized just like any other object property. 如果您只想比较对象中的自定义字段，而完全比较可能太昂贵（即大型二进制数组），那么在该字段上生成MD5哈希值（例如，使用MD5CryptoServiceProvider.ComputeHash() ）并存储该值就MD5CryptoServiceProvider.ComputeHash() 。在对象本身中，它将像其他任何对象属性一样被序列化。

Be wary! 警惕！

The default HashCode of a .Net object often changes between run-time instances of a program. .Net对象的默认HashCode通常在程序的运行时实例之间更改。

In other words, if your program serializes object A , complete with hashcode, to the disc, then the program terminates, and is later restarted, and de-serializes object A from disc, (or creates an identical object A at run-time), it will have a different hashcode than what was stored. 换句话说，如果您的程序将带有哈希码的对象A序列化到光盘上，则该程序终止，然后重新启动，然后从光盘上反序列化对象A （或在运行时创建相同的对象A ）。，它将具有与存储的哈希码不同的哈希码。

This is in part because the default hashcode comes from the Garbage Collectors information on an object. 这部分是因为默认哈希码来自对象上的垃圾收集器信息。 In a new program instance, the GC will have different information, and thus a different hashcode. 在新的程序实例中，GC将具有不同的信息，因此具有不同的哈希码。

If you write your own GetHashCode , you can make a hashcode that is consistent across processes. 如果您编写自己的GetHashCode ，则可以创建在各个进程之间保持一致的哈希码。 But there is a pitfall here you need to be aware of. 但是您需要注意一个陷阱。

Is there any information which you can use to tell which objects were serialized and deserialized from which originals? 有什么信息可用来告诉您哪些对象已从哪些原始文件进行序列化和反序列化？ If so, then you can override GetHashCode() to calculate a hash code based on that information. 如果是这样，则可以重写GetHashCode（）以基于该信息计算哈希码。

If not, you might be able to generate one synthetically by assigning a UUID to each newly-created object. 如果没有，您可以通过为每个新创建的对象分配一个UUID来综合生成一个。 Serialize that value along with the other data so the reconstructed objects have the same UUID. 将该值与其他数据一起序列化，以便重建的对象具有相同的UUID。 You can then simply override GetHashCode() to return that UUID's hash code. 然后，您可以简单地重写GetHashCode（）以返回该UUID的哈希码。 (That should do the job if what you're looking for is a sort of modified version of referential equality.) （如果您要查找的是参照相等性的一种修改版本，则可以完成此工作。）