简体   繁体   English

使用序列化 C# 比较两个对象

[英]Compare two objects using serialization C#

Why it is not a good practice to compare two objects by serializing them and then compare the strings like in the following example?为什么通过序列化两个对象然后比较字符串(如下例所示)来比较它们不是一个好习惯?

public class Obj
{
    public int Prop1 { get; set; }
    public string Prop2 { get; set; }
}

public class Comparator<T> : IEqualityComparer<T>
{
    public bool Equals(T x, T y)
    {
        return JsonConvert.SerializeObject(x) == JsonConvert.SerializeObject(y);
    }

    public int GetHashCode(T obj)
    {
        return JsonConvert.SerializeObject(obj).GetHashCode();
    }
}

Obj o1 = new Obj { Prop1 = 1, Prop2 = "1" };
Obj o2 = new Obj { Prop1 = 1, Prop2 = "2" };

bool result = new Comparator<Obj>().Equals(o1, o2);

I have tested it and it works, it is generic so it could stand for a great diversity of objects, but what I am asking is which are the downsides of this approach for comparing objects?我已经测试过它并且它有效,它是通用的,所以它可以代表各种各样的对象,但我要问的是这种比较对象的方法的缺点是什么?

I have seen it has been suggested in this question and it received some upvotes but I can't figure it out why this is not considered the best way, if somebody wants to compare just the values of the properties of two objects?我已经看到在这个问题中有人提出了,它收到了一些赞成票,但我无法弄清楚为什么这不被认为是最好的方法,如果有人只想比较两个对象的属性值?

EDIT : I am strictly talking about Json serialize, not XML.编辑:我严格说的是 Json 序列化,而不是 XML。

I am asking this because I want to create a simple and generic Comparator for a Unit Test project, so the performance of comparison does not bother me so much, as I know this may be one of the biggest down-sides.我问这个是因为我想为单元测试项目创建一个简单而通用的Comparator器,所以比较的性能并没有那么困扰我,因为我知道这可能是最大的缺点之一。 Also the typeless problem can be handled using in case of Newtonsoft.Json the TypeNameHandling property set to All .在 Newtonsoft.Json 的情况下,也可以使用TypeNameHandling属性设置为All来处理无类型问题。

The primary problem is that it is inefficient主要问题是效率低下

As an example imagine this Equals function作为一个例子,想象这个 Equals 函数

public bool Equals(T x, T y)
{
    return x.Prop1 == y.Prop1
        && x.Prop2 == y.Prop2
        && x.Prop3 == y.Prop3
        && x.Prop4 == y.Prop4
        && x.Prop5 == y.Prop5
        && x.Prop6 == y.Prop6;
}

if prop1 are not the same then the other 5 compares never need to be checked, if you did this with JSON you would have to convert the entire object into a JSON string then compare the string every time, this is on top of serialization being an expensive task all on its own.如果 prop1 不相同,则永远不需要检查其他 5 个比较,如果您使用 JSON 执行此操作,则必须将整个对象转换为 JSON 字符串,然后每次都比较该字符串,这是序列化之上的昂贵的任务完全靠自己。

Then the next problem is serialization is designed for communication eg from memory to a file, across a network, etc. If you have leveraged serialization for comparison you can degrade your ability to use it for it normal use, ie you can't ignore fields not required for transmission because ignoring them might break your comparer.然后下一个问题是序列化是为通信而设计的,例如从内存到文件,通过网络等。如果您利用序列化进行比较,您可能会降低将其用于正常使用的能力,即您不能忽略字段传输不需要,因为忽略它们可能会破坏您的比较器。

Next JSON in specific is Type-less which means than values than are not in anyway shape or form equal may be mistaken for being equal, and in the flipside values that are equal may not compare as equal due to formatting if they serialize to the same value, this is again unsafe and unstable下一个 JSON 具体是无类型的,这意味着比任何形状或形式都不相等的值可能被误认为是相等的,并且如果它们序列化为相同的值,则相等的值可能由于格式化而无法比较为相等价值,这又是不安全和不稳定的

The only upside to this technique is that is requires little effort for the programmer to implement这种技术的唯一好处是程序员只需很少的努力就可以实现

You probably going to keep adding a bounty to the question until somebody tells you that it is just fine to do this.您可能会继续为这个问题增加悬赏,直到有人告诉您这样做就可以了。 So you got it, don't hesitate to take advantage of the NewtonSoft.Json library to keep the code simple.所以你明白了,不要犹豫,利用 NewtonSoft.Json 库来保持代码简单。 You just need some good arguments to defend your decision if your code is ever reviewed or if somebody else takes over the maintenance of the code.如果您的代码曾经被审查或者是否有人接管了代码的维护,您只需要一些好的论据来为您的决定辩护。

Some of the objections they may raise, and their counter-arguments:他们可能提出的一些反对意见,以及他们的反驳:

This is very inefficient code!这是非常低效的代码!

It certainly is, particularly GetHashCode() can make your code brutally slow if you ever use the object in a Dictionary or HashSet.确实如此,尤其是如果您曾经在 Dictionary 或 HashSet 中使用该对象,那么 GetHashCode() 会使您的代码变得非常缓慢。

Best counter-argument is to note that efficiency is of little concern in a unit test.最好的反驳是指出效率在单元测试中无关紧要。 The most typical unit test takes longer to get started than to actually execute and whether it takes 1 millisecond or 1 second is not relevant.最典型的单元测试需要比实际执行更长的时间才能开始,并且需要 1 毫秒还是 1 秒并不重要。 And a problem you are likely to discover very early.一个你很可能很早就发现的问题。

You are unit-testing a library you did not write!您正在对不是您编写的库进行单元测试!

That is certainly a valid concern, you are in effect testing NewtonSoft.Json's ability to generate a consistent string representation of an object.这当然是一个有效的问题,您实际上是在测试 NewtonSoft.Json 生成对象的一致字符串表示的能力。 There is cause to be alarmed about this, in particular floating point values (float and double) are never not a problem.有理由对此感到担忧,特别是浮点值(float 和 double)从来都不是问题。 There is also some evidence that the library author is unsure how to do it correctly.还有一些证据表明图书馆作者不确定如何正确地做到这一点。

Best counter-argument is that the library is widely used and well maintained, the author has released many updates over the years.最好的反驳是该库被广泛使用且维护良好,作者多年来发布了许多更新。 Floating point consistency concerns can be reasoned away when you make sure that the exact same program with the exact same runtime environment generates both strings (ie don't store it) and you make sure the unit-test is built with optimization disabled.当您确保具有完全相同的运行时环境的完全相同的程序生成两个字符串(即不存储它)并且您确保单元测试是在禁用优化的情况下构建时,浮点一致性问题可以被排除在外。

You are not unit-testing the code that needs to be tested!您不是在对需要测试的代码进行单元测试!

Yes, you would only write this code if the class itself provides no way to compare objects.是的,如果类本身没有提供比较对象的方法,您只会编写此代码。 In other words, does not itself override Equals/GetHashCode and does not expose a comparator.换句话说,本身不会覆盖 Equals/GetHashCode 并且不会公开比较器。 So testing for equality in your unit test exercise a feature that the to-be-tested code does not actually support.因此,在单元测试中测试相等性会锻炼待测试代码实际上不支持的功能。 Something that a unit test should never do, you can't write a bug report when the test fails.单元测试不应该做的事情,当测试失败时你不能写错误报告。

Counter argument is to reason that you need to test for equality to test another feature of the class, like the constructor or property setters. Counter 参数是为了说明您需要测试相等性以测试类的另一个功能,例如构造函数或属性设置器。 A simple comment in the code is enough to document this.代码中的简单注释足以记录这一点。

By serializing your objects to JSON, you are basically changing all of your objects to another data type and so everything that applies to your JSON library will have an impact on your results.通过将您的对象序列化为 JSON,您基本上是将所有对象更改为另一种数据类型,因此适用于您的 JSON 库的所有内容都会对您的结果产生影响。

So if there is a tag like [ScriptIgnore] in one of the objects, your code will simply ignore it since it has been omitted from your data.因此,如果在其中一个对象中有像 [ScriptIgnore] 这样的标记,您的代码将简单地忽略它,因为它已从您的数据中省略。

Also, the string results can be the same for objects that are not the same.此外,对于不相同的对象,字符串结果可能相同。 like this example.像这个例子。

static void Main(string[] args)
{
    Xb x1 = new X1()
    {
        y1 = 1,
        y2 = 2
    };
    Xb x2 = new X2()
    {
        y1 = 1,
        y2= 2
    };
   bool result = new Comparator<Xb>().Equals(x1, x2);
}
}

class Xb
{
    public int y1 { get; set; }
}

class X1 : Xb
{
    public short y2 { get; set; }
}
class X2 : Xb
{
    public long y2 { get; set; }
}

So as you see x1 has a different type from x2 and even the data type of the y2 is different for those two, but the json results will be the same.因此,如您所见,x1 与 x2 的类型不同,甚至这两个 y2 的数据类型也不同,但 json 结果将相同。

Other than that, since both x1 and x2 are from type Xb, I could call your comparator without any problems.除此之外,由于 x1 和 x2 都来自 Xb 型,我可以毫无问题地调用您的比较器。

I would like to correct the GetHashCode at the beginning.我想在开始时更正GetHashCode

public class Comparator<T> : IEqualityComparer<T>
{
    public bool Equals(T x, T y)
    {
        return JsonConvert.SerializeObject(x) == JsonConvert.SerializeObject(y);
    }
    public int GetHashCode(T obj)
    {
        return JsonConvert.SerializeObject(obj).GetHashCode();
    }
}

Okay, next, we discuss the problem of this method.好,接下来我们讨论这个方法的问题。


First, it just won't work for types with looped linkage.首先,它不适用于具有循环链接的类型。

If you have a property linkage as simple as A -> B -> A, it fails.如果你有一个像 A -> B -> A 这样简单的属性链接,它就会失败。

Unfortunately, this is very common in lists or map that interlink together.不幸的是,这在相互关联的列表或地图中很常见。

Worst, there is hardly an efficient generic loop detection mechanism.最糟糕的是,几乎没有有效的通用循环检测机制。


Second, comparison with serialization is just inefficient.其次,与序列化比较效率低下。

JSON needs reflection and lots of type judging before successfully compile its result. JSON 在成功编译其结果之前需要反射和大量类型判断。

Therefore, your comparer will become a serious bottleneck in any algorithm.因此,您的比较器将成为任何算法中的严重瓶颈。

Usually, even if in thousands of records cases, JSON is considered slow enough.通常,即使在数千条记录的情况下,JSON 也被认为足够慢。


Third, JSON has to go over every property.第三,JSON 必须检查每个属性。

It will become a disaster if your object links to any big object.如果您的对象链接到任何大对象,那将是一场灾难。

What if your object links to a big file?如果您的对象链接到一个大文件怎么办?


As a result, C# simply leaves the implementation to user.因此,C# 只是将实现留给用户。

One has to know his class thoroughly before creating a comparator.在创建比较器之前,必须彻底了解他的类。

Comparison requires good loop detection, early termination and efficiency consideration.比较需要良好的循环检测、提前终止和效率考虑。

A generic solution simply does not exist.一个通用的解决方案根本不存在。

First, I notice that you say "serialize them and then compare the strings."首先,我注意到您说“序列化它们然后比较字符串”。 In general, ordinary string comparison will not work for comparing XML or JSON strings, you have to be a little more sophisticated than that.一般来说,普通的字符串比较不适用于比较 XML 或 JSON 字符串,您必须比这更复杂一些。 As a counterexample to string comparison, consider the following XML strings:作为字符串比较的反例,请考虑以下 XML 字符串:

<abc></abc>
<abc/>

They are clearly not string equal but they definitely "mean" the same thing.它们显然不是字符串相等,但它们绝对“意味着”同一件事。 While this example might seem contrived, it turns out that there are quite a few cases where string comparison doesn't work.虽然这个例子可能看起来很人为,但事实证明在很多情况下字符串比较不起作用。 For example, whitespace and indentation are significant in string comparison but may not be significant in XML.例如,空格和缩进在字符串比较中很重要,但在 XML 中可能不重要。

The situation isn't all that much better for JSON. JSON 的情况也好不到哪里去。 You can do similar counterexamples for that.你可以为此做类似的反例。

{ abc : "def" }
{
   abc : "def"
}

Again, clearly these mean the same thing, but they're not string-equal.同样,显然这些意思是相同的,但它们不是字符串相等的。

Essentially, if you're doing string comparison you're trusting the serializer to always serialize a particular object in exactly the same way (without any added whitespace, etc), which ends up being remarkably fragile, especially given that most libraries do not, to my knowledge, provide any such guarantee.本质上,如果你在做字符串比较,你相信序列化器总是以完全相同的方式序列化一个特定的对象(没有任何添加的空格等),这最终非常脆弱,特别是考虑到大多数库没有,据我所知,提供任何此类保证。 This is particularly problematic if you update the serialization libraries at some point and there's a subtle difference in how they do the serialization;如果您在某个时候更新序列化库并且它们执行序列化的方式存在细微差别,则这尤其成问题; in this case, if you try to compare a saved object that was serialized with the previous version of the library with one that was serialized with the current version then it wouldn't work.在这种情况下,如果您尝试将使用库的先前版本序列化的已保存对象与使用当前版本序列化的对象进行比较,那么它将不起作用。

Also, just as a quick note on your code itself, the "==" operator is not the proper way to compare objects.此外,就像对代码本身的快速说明一样,“==”运算符不是比较对象的正确方法。 In general, "==" tests for reference equality, not object equality.通常,“==”测试引用相等性,而不是对象相等性。

One more quick digression on hash algorithms: how reliable they are as a means of equality testing depends on how collision resistant they are.关于哈希算法的一个更快速的题外话:它们作为相等性测试手段的可靠性取决于它们的抗碰撞性。 In other words, given two different, non-equal objects, what's the probability that they'll hash to the same value?换句话说,给定两个不同的、不相等的对象,它们散列到相同值的概率是多少? Conversely, if two objects hash to the same value, what are the odds that they're actually equal?相反,如果两个对象散列到相同的值,它们实际上相等的几率是多少? A lot of people take it for granted that their hash algorithms are 100% collision resistant (ie two objects will hash to the same value if, and only if, they're equal) but this isn't necessarily true.许多人理所当然地认为他们的散列算法是 100% 抗碰撞的(即,当且仅当它们相等时,两个对象才会散列到相同的值),但这不一定是真的。 (A particularly well-known example of this is the MD5 cryptographic hash function, whose relatively poor collision resistance has rendered it unsuitable for further use). (一个特别著名的例子是 MD5 加密散列函数,其相对较差的抗碰撞性使其不适合进一步使用)。 For a properly-implemented hash function, in most cases the probability that two objects that hash to the same value are actually equal is sufficiently high to be suitable as a means of equality testing but it's not guaranteed.对于正确实现的散列函数,在大多数情况下,散列到相同值的两个对象实际上相等的概率足够高,适合作为相等性测试的手段,但不能保证。

These are some of the downsides:这些是一些缺点:

a) Performance will be increasingly bad the deeper your object tree is. a) 对象树越深,性能就会越差。

b) new Obj { Prop1 = 1 } Equals new Obj { Prop1 = "1" } Equals new Obj { Prop1 = 1.0 } b) new Obj { Prop1 = 1 } Equals new Obj { Prop1 = "1" } Equals new Obj { Prop1 = 1.0 }

c) new Obj { Prop1 = 1.0, Prop2 = 2.0 } Not Equals new Obj { Prop2 = 2.0, Prop1 = 1.0 } c) new Obj { Prop1 = 1.0, Prop2 = 2.0 } Not Equals new Obj { Prop2 = 2.0, Prop1 = 1.0 }

Object comparison using serialize and then comparing the strings representations in not effective in the following cases:在以下情况下,使用序列化然后比较字符串表示的对象比较无效:

When a property of type DateTime exists in the types that need to be compared当需要比较的类型中存在DateTime类型的属性时

public class Obj
{
    public DateTime Date { get; set; }
}

Obj o1 = new Obj { Date = DateTime.Now };
Obj o2 = new Obj { Date = DateTime.Now };

bool result = new Comparator<Obj>().Equals(o1, o2);

It will result false even for objects very close created in time, unless they don't share the exactly same property.即使对于及时创建的非常接近的对象,它也会导致false ,除非它们不共享完全相同的属性。


For objects that have double or decimal values which need to be compared with an Epsilon to verify if they are eventually very close to each other对于需要与 Epsilon 进行比较以验证它们最终是否彼此非常接近的具有双精度或十进制值的对象

public class Obj
{
    public double Double { get; set; }
}

Obj o1 = new Obj { Double = 22222222222222.22222222222 };
Obj o2 = new Obj { Double = 22222222222222.22222222221 };

bool result = new Comparator<Obj>().Equals(o1, o2);

This will also return false even the double values are really close to each other, and in the programs which involves calculation, it will become a real problem, because of the loss of precision after multiple divide and multiply operations, and the serialize does not offer the flexibility to handle these cases.即使double值非常接近,这也会返回false ,并且在涉及计算的程序中,这将成为一个真正的问题,因为多次除法和乘法运算后精度损失,而serialize不提供处理这些情况的灵活性。


Also considering the above cases, if one wants not to compare a property, it will face the problem of introducing a serialize attribute to the actual class, even if it is not necessary and it will lead to code pollution or problems went it will have to actually use serialization for that type.还要考虑上面的情况,如果不想比较一个属性,就会面临在实际类中引入serialize属性的问题,即使没有必要也会导致代码污染或问题实际上对该类型使用序列化。

Note: These are some of the actual problems of this approach, but I am looking forward to find others.注意:这些是这种方法的一些实际问题,但我期待找到其他问题。

For unit tests you don`t need write own comparer.对于单元测试,您不需要编写自己的比较器。 :) :)

Just use modern frameworks.只需使用现代框架。 For example try FluentAssertions library例如尝试FluentAssertions 库

o1.ShouldBeEquivalentTo(o2);

Serialization was made for storing an object or sending it over a pipe (network) that is outside of the current execution context.序列化用于存储对象或通过当前执行上下文之外的管道(网络)发送对象。 Not for doing something inside the execution context.不是为了在执行上下文中做某事。

Some serialized values might not be considered equal, which in fact they are : decimal "1.0" and integer "1" for instance.某些序列化值可能不被视为相等,实际上它们是:例如,十进制“1.0”和整数“1”。

For sure you can just like you can eat with a shovel but you don't because you might break your tooth!当然,您可以像可以用铲子吃饭一样,但您不能这样做,因为您可能会折断牙齿!

You can use System.Reflections namespace to get all the properties of the instance like in this answer .您可以使用System.Reflections命名空间来获取实例的所有属性,如本答案中所示 With Reflection you can compare not only public properties, or fields (like using Json Serialization), but also some private , protected , etc. to increase the speed of calculation.使用反射,您不仅可以比较public属性或字段(如使用 Json 序列化),还可以比较一些privateprotected等,以提高计算速度。 And of course, it's obvious that you don't have to compare all properties or fields of instance if two objects are different (excluding the example when only the last property or field of object differs).当然,如果两个对象不同(不包括对象的最后一个属性或字段不同的示例除外),则很明显您不必比较实例的所有属性或字段。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM