简体   繁体   English

XML与序列化/反序列化的二进制性能

[英]XML vs Binary performance for Serialization/Deserialization

I'm working on a compact framework application and need to boost performance. 我正在开发一个紧凑的框架应用程序,需要提高性能。 The app currently works offline by serializing objects to XML and storing them in a database. 该应用程序当前通过将对象序列化为XML并将其存储在数据库中而脱机工作。 Using a profiling tool I could see this was quite a big overhead, slowing the app. 使用分析工具,我可以看到这是一个相当大的开销,减慢了应用程序。 I thought if I switched to a binary serialization the performance would increase, but because this is not supported in the compact framework I looked at protobuf-net. 我想如果我切换到二进制序列化,性能会增加,但因为在紧凑的框架中不支持,我看了protobuf-net。 The serialization seems quicker, but deserialization much slower and the app is doing more deserializing than serializing. 序列化似乎更快,但反序列化要慢得多,并且应用程序比序列化更多地反序列化。

Should binary serialization should be faster and if so what I can do to speed up the performance? 二进制序列化应该更快,如果是这样,我可以做些什么来加快性能? Here's a snippet of how I'm using both XML and binary: 这是我如何使用XML和二进制文件的片段:

XML serialization: XML序列化:

public string Serialize(T obj)
{
  UTF8Encoding encoding = new UTF8Encoding();
  XmlSerializer serializer = new XmlSerializer(typeof(T));
  MemoryStream stream = new MemoryStream();
  XmlTextWriter writer = new XmlTextWriter(stream, Encoding.UTF8);
  serializer.Serialize(stream, obj);
  stream = (MemoryStream)writer.BaseStream;
  return encoding.GetString(stream.ToArray(), 0, Convert.ToInt32(stream.Length));
}
public T Deserialize(string xml)
{
  UTF8Encoding encoding = new UTF8Encoding();
  XmlSerializer serializer = new XmlSerializer(typeof(T));
  MemoryStream stream = new MemoryStream(encoding.GetBytes(xml));            
  return (T)serializer.Deserialize(stream);
}

Protobuf-net Binary serialization: Protobuf-net二进制序列化:

public byte[] Serialize(T obj)
{
  byte[] raw;
  using (MemoryStream memoryStream = new MemoryStream())
  {
    Serializer.Serialize(memoryStream, obj);
    raw = memoryStream.ToArray();
  }

  return raw;            
}

public T Deserialize(byte[] serializedType)
{
  T obj;
  using (MemoryStream memoryStream = new MemoryStream(serializedType))
  {
    obj = Serializer.Deserialize<T>(memoryStream);
  }
  return obj;
}

I'm going to correct myself on this, Marc Gravall pointed out the first iteration has an overhead of bulding the model so I've done some tests taking the average of 1000 iterations of serialization and deserialization for both XML and binary. 我要对此进行纠正,Marc Gravall指出第一次迭代有一个建模模型的开销,所以我做了一些测试,平均需要1000次迭代的序列化和反序列化的XML和二进制。 I tried my tests with the v2 of the Compact Framework DLL first, and then with the v3.5 DLL. 我首先使用Compact Framework DLL的v2尝试我的测试,然后使用v3.5 DLL。 Here's what I got, time is in ms: 这是我得到的,时间是以毫秒为单位:

.NET 2.0
================================ XML ====== Binary ===
Serialization 1st Iteration      3236       5508
Deserialization 1st Iteration    1501       318
Serialization Average            9.826      5.525
Deserialization Average          5.525      0.771

.NET 3.5
================================ XML ====== Binary ===
Serialization 1st Iteration      3307       5598
Deserialization 1st Iteration    1386       200
Serialization Average            10.923     5.605
Deserialization Average          5.605      0.279

The main expense in your method is the actual generation of the XmlSerializer class. 您的方法的主要费用是实际生成XmlSerializer类。 Creating the serialiser is a time consuming process which you should only do once for each object type. 创建序列化器是一个耗时的过程,您应该只为每个对象类型执行一次。 Try caching the serialisers and see if that improves performance at all. 尝试缓存序列化程序,看看是否可以提高性能。

Following this advice I saw a large performance improvement in my app which allowed me to continute to use XML serialisation. 按照这个建议,我看到我的应用程序的性能大大提高,这使我能够继续使用XML序列化。

Hope this helps. 希望这可以帮助。

Interesting... thoughts: 有趣......想法:

  • what version of CF is this; 这是什么版本的CF; 2.0? 2.0? 3.5? 3.5? In particular, CF 3.5 has Delegate.CreateDelegate that allows protobuf-net to access properties much faster than in can in CF 2.0 特别是,CF 3.5具有Delegate.CreateDelegate ,它允许protobuf-net比CF 2.0中的can更快地访问属性
  • are you annotating fields or properties ? 你在注释字段属性吗? Again, in CF the reflection optimisations are limited; 同样,在CF中,反射优化是有限的; you can get beter performance in CF 3.5 with properties , as with a field the only option I have available is FieldInfo.SetValue 你可以在CF 3.5中获得具有属性的 beter性能,就像在字段中我唯一可用的选项是FieldInfo.SetValue

There are a number of other things that simply don't exist in CF, so it has to make compromises in a few places. 在CF中还有许多其他东西根本不存在,所以它必须在一些地方做出妥协。 For overly complex models there is also a known issue with the generics limitations of CF . 对于过于复杂的模型,CF的泛型限制也存在已知问题 A fix is underway, but it is a big change, and is taking "a while". 正在进行修复,但这是一个很大的变化,并且需要“一段时间”。

For info, some metrics on regular (full) .NET comparing various formats (including XmlSerializer and protobuf-net) are here . 有关信息,常规(完整).NET比较各种格式(包括XmlSerializer和protobuf-net)的一些指标在这里

Have you tried creating custom serialization classes for your classes? 您是否尝试为类创建自定义序列化类? Instead of using XmlSerializer which is a general purpose serializer (it creates a bunch of classes at runtime). 而不是使用XmlSerializer,它是一个通用的序列化程序(它在运行时创建一堆类)。 There's a tool for doing this (sgen). 这是一个工具(sgen)。 You run it during your build process and it generates a custom assembly that can be used in pace of XmlSerializer. 您在构建过程中运行它,它会生成一个可以在XmlSerializer中使用的自定义程序集。

If you have Visual Studio, the option is available under the Build tab of your project's properties. 如果您有Visual Studio,则可以在项目属性的“构建”选项卡下找到该选项。

Is the performance hit in serializing the objects, or writing them to the database? 串行化对象或将它们写入数据库会影响性能吗? Since writing them is likely hitting some kind of slow storage, I'd imagine it to be a much bigger perf hit than the serialization step. 由于编写它们可能会遇到某种缓慢的存储,我认为它比序列化步骤要大得多。

Keep in mind that the perf measurements posted by Marc Gravell are testing the performance over 1,000,000 iterations. 请记住,Marc Gravell发布的性能测量结果正在测试超过1,000,000次迭代的性能。

What kind of database are you storing them in? 你将它们存储在什么类型的数据库中? Are the objects serialized in memory or straight to storage? 对象是在内存中序列化还是直接存储? How are they being sent to the db? 他们是如何被发送到数据库的? How big are the objects? 物体有多大? When one is updated, do you send all of the objects to the database, or just the one that has changed? 如果更新了一个,您是将所有对象发送到数据库,还是只发送了更改的对象? Are you caching anything in memory at all, or re-reading from storage each time? 您是否在内存中缓存任何内容,或者每次都从存储中重新读取?

XML is often slow to process and takes up a lot of space. XML处理起来很慢并且占用了大量空间。 There have been a number of different attempts to tackle this, and the most popular today seems to be to just drop the lot in a gzip file, like with the Open Packaging Convention . 已经有很多不同的尝试来解决这个问题,而今天最流行的尝试似乎只是放弃了一个gzip文件,就像Open Packaging Convention一样

The W3C has shown the gzip approach to be less than optimal, and they and various other groups have been working on a better binary serialisation suitable for fast processing and compression, for transmission. W3C已经证明gzip方法不是最优的,并且它们和其他各种一直致力于更好的二进制序列化,适用于快速处理和压缩,用于传输。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM