简体   繁体   English

在 C# 中构建序列化程序的最有效方法

[英]Most efficient way to build a serializer in C#

I am currently building my own serializer and I'm at a point where I no longer want to misuse the System.Xml.Linq classes so I'm building my own.我目前正在构建自己的序列化程序,并且我不想再滥用System.Xml.Linq类,因此我正在构建自己的序列化程序。

Let's say I have these classes:假设我有这些课程:

  • XmlElement元素
  • XmlAttribute xml属性
  • XmlText文本

And let's assume these only have a Name property which returns string and a Value property which returns a IReadonlyList<IXmlNode> .让我们假设它们只有一个返回stringName属性和一个返回IReadonlyList<IXmlNode>Value属性。

The question I have is, would it be more efficient to make the classes themselves responsible for writing out their own serialized value or would it be more efficient to have a class that uses pattern matching?我的问题是,让类自己负责写出自己的序列化值会更有效,还是使用模式匹配的类会更有效?

So for example option A :例如选项 A

public class XmlElement: IXmlNode {

    public void Write(StringBuilder stringBuilder) {

        stringBuilder.WriteLine($"<{Name}>")

        foreach(var child in Children) {
            child.Write(stringBuilder);
        }

        stringBuilder.WriteLine($"</{Name}>")

    }

}

Or option B :选项B

public class XmlWriter {

    public string Write(IXmlNode node, StringBuilder passedStringBuilder) {

        var stringBuilder = passedStringBuilder ?? new StringBuilder();

        if (IXmlNode is XmlElement xmlElement) WriteElement(xmlElement, stringBuilder);
        if (IXmlNode is XmlAttribute xmlAttribute) WriteAttribute(xmlAttribute, stringBuilder);
        if (IXmlNode is XmlText xmlText) WriteText(xmlText, stringBuilder);

        return stringBuilder.ToString();
    }

    public void WriteElement(XmlElement element, stringBuilder) {

        stringBuilder.WriteLine($"<{element.Name}>")

        foreach(var child in element.Children) {
            Write(child, stringBuilder);
        }

        stringBuilder.WriteLine($"</{element.Name}>")

    }

}

Obviously there is more to writing an XML serializer, this isn't great code and I left out some parts here and there.显然,编写 XML 序列化程序还有更多内容,这不是很好的代码,我在这里和那里遗漏了一些部分。
I'm mostly concerned about the concept of which would be most efficient.我最关心的是哪个概念最有效。

Additional information附加信息

Since there have been some requests to define what I mean by efficiency here are some factors I'd like to score the implementation on:由于有一些要求定义我所说的效率,这里有一些我想对实现进行评分的因素:

  • Speed
  • Garbage collection
  • Memory allocation

Of course code readability is also a factor however, at this point I've written both implementations and in regards of readability Option A has my preference.当然,代码可读性也是一个因素,但在这一点上,我已经编写了两个实现,在可读性方面,我更喜欢选项 A。 Option B resulted in quite some lines of code in a single file and arguably it's doing too much for one class.选项 B在单个文件中产生了相当多的代码行,可以说它对一个类做了太多的事情。

So in short:简而言之:
Unless Option B greatly outperforms Option A my preference will go towards Option A .除非选项 B大大优于选项 A,否则我的偏好将转向选项 A。

In this case I'd suppose the answer would be Option A .在这种情况下,我认为答案是Option A

What I did to test this:我做了什么来测试这个:

  • I created basic data types for JSON and XML which would write out their own serialized value when ToString was called我为 JSON 和 XML 创建了基本数据类型,它们会在调用ToString时写出它们自己的序列化值
  • I wrote some tests that would compare the output of an expected string representation to make sure the output was what I expected我编写了一些测试来比较预期字符串表示的输出,以确保输出符合我的预期
  • I created a basic implementation of a serializer for both a JSON and an XML datatype我为 JSON 和 XML 数据类型创建了一个序列化程序的基本实现
  • I added test cases to the existing tests to ensure the output of both solutions was the same我在现有测试中添加了测试用例,以确保两种解决方案的输出相同
  • I wrote a basic BenchmarkDotNet test for both options and both data types and ran that with a Bogus fueled data input of about 200 complex items as an initial sample我为这两种选项和两种数据类型编写了一个基本的BenchmarkDotNet测试,并使用大约 200 个复杂项目的虚假数据输入作为初始样本运行该测试

This is the result of the benchmark test:这是基准测试的结果:

// * Summary *

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1288 (21H1/May2021Update)
Intel Core i7-8750H CPU 2.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK=5.0.402
  [Host]                                                                              : .NET Core 3.1.20 (CoreCLR 4.700.21.47003, CoreFX 4.700.21.47101), X64 RyuJIT
  BenchmarkDotNet, Version=0.13.1.0, Culture=neutral, PublicKeyToken=aa0ca2f9092cefc4 : .NET Core 3.1.20 (CoreCLR 4.700.21.47003, CoreFX 4.700.21.47101), X64 RyuJIT

Job=BenchmarkDotNet, Version=0.13.1.0, Culture=neutral, PublicKeyToken=aa0ca2f9092cefc4  MaxRelativeError=0.01  IterationCount=1
LaunchCount=5  RunStrategy=ColdStart  UnrollFactor=1
WarmupCount=1

|           Method |       Mean |     Error |   StdDev |       Gen 0 | Completed Work Items | Lock Contentions |      Gen 1 | Allocated |
|----------------- |-----------:|----------:|---------:|------------:|---------------------:|-----------------:|-----------:|----------:|
| JsonDataToSTring |   420.3 ms |  42.31 ms | 10.99 ms |  19000.0000 |               4.0000 |                - |  6000.0000 |    232 MB |
| SerialJsonWriter |   420.7 ms |  58.03 ms | 15.07 ms |  19000.0000 |               4.0000 |                - |  6000.0000 |    232 MB |
|  XmlDataToSTring | 1,012.1 ms | 342.95 ms | 89.06 ms | 145000.0000 |               4.0000 |                - | 35000.0000 |  1,036 MB |
|  SerialXmlWriter | 1,128.5 ms |  70.71 ms | 18.36 ms | 203000.0000 |               4.0000 |                - | 40000.0000 |  1,384 MB |

The benchmark test was not extensive by any means and I did not do a very in-depth evaluation because these results to me are pretty clear.无论如何,基准测试并不广泛,我没有进行非常深入的评估,因为这些结果对我来说非常清楚。

For both JSON and XML Option A scores best on time , garbage collection and memory allocation on a pretty small dataset.对于 JSON 和 XML,选项 A在非常小的数据集上在timegarbage collectionmemory allocation得分最高。

In retrospect, I could've seen this coming because C# doesn't support tail recursion.回想起来,我可以预见到这一点,因为 C# 不支持尾递归。
In fact, when I increased the data size I managed to crash my computer by profiling code that overflowed the stack.事实上,当我增加数据大小时,我设法通过分析溢出堆栈的代码使我的计算机崩溃。

PS: I don't think it's valuable to go into too much detail of how the data is generated because such a small dataset already made such a difference. PS:我认为详细说明数据是如何生成的并没有什么价值,因为这么小的数据集已经产生了如此大的不同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM