简体   繁体   English

提高大型结构列表的二进制序列化性能

[英]Improve Binary Serialization Performance for large List of structs

I have a structure holding 3d co-ordinates in 3 ints. 我有一个结构,持有3个整数的三维坐标。 In a test I've put together a List<> of 1 million random points and then used Binary serialization to a memory stream. 在测试中,我将一百个随机点的List <>组合在一起,然后将二进制序列化用于内存流。

The memory stream is coming in a ~ 21 MB - which seems very inefficient as 1000000 points * 3 coords * 4 bytes should come out at 11MB minimum 内存流大约为21 MB - 这似乎非常低效,因为1000000点* 3个coords * 4个字节应该在最小11MB时出现

Its also taking ~ 3 seconds on my test rig. 它在我的测试台上也需要约3秒钟。

Any ideas for improving performance and/or size? 有什么改善性能和/或尺寸的想法?

(I don't have to keep the ISerialzable interface if it helps, I could write out directly to a memory stream) (如果有帮助,我不必保留ISerialzable接口,我可以直接写入内存流)

EDIT - From answers below I've put together a serialization showdown comparing BinaryFormatter, 'Raw' BinaryWriter and Protobuf 编辑 - 从下面的答案我已经把一个序列化摊牌比较BinaryFormatter,'原始'BinaryWriter和Protobuf

using System;
using System.Text;
using System.Collections.Generic;
using System.Linq;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using System.Runtime.Serialization;
using System.Runtime.Serialization.Formatters.Binary;
using System.IO;
using ProtoBuf;

namespace asp_heatmap.test
{
    [Serializable()] // For .NET BinaryFormatter
    [ProtoContract] // For Protobuf
    public class Coordinates : ISerializable
    {
        [Serializable()]
        [ProtoContract]
        public struct CoOrd
        {
            public CoOrd(int x, int y, int z)
            {
                this.x = x;
                this.y = y;
                this.z = z;
            }
            [ProtoMember(1)]            
            public int x;
            [ProtoMember(2)]
            public int y;
            [ProtoMember(3)]
            public int z;
        }

        internal Coordinates()
        {
        }

        [ProtoMember(1)]
        public List<CoOrd> Coords = new List<CoOrd>();

        public void SetupTestArray()
        {
            Random r = new Random();
            List<CoOrd> coordinates = new List<CoOrd>();
            for (int i = 0; i < 1000000; i++)
            {
                Coords.Add(new CoOrd(r.Next(), r.Next(), r.Next()));
            }
        }

        #region Using Framework Binary Formatter Serialization

        void ISerializable.GetObjectData(SerializationInfo info, StreamingContext context)
        {
            info.AddValue("Coords", this.Coords);
        }

        internal Coordinates(SerializationInfo info, StreamingContext context)
        {
            this.Coords = (List<CoOrd>)info.GetValue("Coords", typeof(List<CoOrd>));
        }

        #endregion

        # region 'Raw' Binary Writer serialization

        public MemoryStream RawSerializeToStream()
        {
            MemoryStream stream = new MemoryStream(Coords.Count * 3 * 4 + 4);
            BinaryWriter writer = new BinaryWriter(stream);
            writer.Write(Coords.Count);
            foreach (CoOrd point in Coords)
            {
                writer.Write(point.x);
                writer.Write(point.y);
                writer.Write(point.z);
            }
            return stream;
        }

        public Coordinates(MemoryStream stream)
        {
            using (BinaryReader reader = new BinaryReader(stream))
            {
                int count = reader.ReadInt32();
                Coords = new List<CoOrd>(count);
                for (int i = 0; i < count; i++)                
                {
                    Coords.Add(new CoOrd(reader.ReadInt32(),reader.ReadInt32(),reader.ReadInt32()));
                }
            }        
        }
        #endregion
    }

    [TestClass]
    public class SerializationTest
    {
        [TestMethod]
        public void TestBinaryFormatter()
        {
            Coordinates c = new Coordinates();
            c.SetupTestArray();

            // Serialize to memory stream
            MemoryStream mStream = new MemoryStream();
            BinaryFormatter bformatter = new BinaryFormatter();
            bformatter.Serialize(mStream, c);
            Console.WriteLine("Length : {0}", mStream.Length);

            // Now Deserialize
            mStream.Position = 0;
            Coordinates c2 = (Coordinates)bformatter.Deserialize(mStream);
            Console.Write(c2.Coords.Count);

            mStream.Close();
        }

        [TestMethod]
        public void TestBinaryWriter()
        {
            Coordinates c = new Coordinates();
            c.SetupTestArray();

            MemoryStream mStream = c.RawSerializeToStream();
            Console.WriteLine("Length : {0}", mStream.Length);

            // Now Deserialize
            mStream.Position = 0;
            Coordinates c2 = new Coordinates(mStream);
            Console.Write(c2.Coords.Count);
        }

        [TestMethod]
        public void TestProtoBufV2()
        {
            Coordinates c = new Coordinates();
            c.SetupTestArray();

            MemoryStream mStream = new MemoryStream();
            ProtoBuf.Serializer.Serialize(mStream,c);
            Console.WriteLine("Length : {0}", mStream.Length);

            mStream.Position = 0;
            Coordinates c2 = ProtoBuf.Serializer.Deserialize<Coordinates>(mStream);
            Console.Write(c2.Coords.Count);
        }
    }
}

Results (Note PB v2.0.0.423 beta) 结果(注意PB v2.0.0.423 beta)

                Serialize | Ser + Deserialize    | Size
-----------------------------------------------------------          
BinaryFormatter    2.89s  |      26.00s !!!      | 21.0 MB
ProtoBuf v2        0.52s  |       0.83s          | 18.7 MB
Raw BinaryWriter   0.27s  |       0.36s          | 11.4 MB

Obviously this is just looking at speed/size and doesn't take into account anything else. 显然,这只是关注速度/尺寸,并没有考虑其他任何事情。

Binary serialisation using BinaryFormatter includes type information in the bytes it generates. 使用BinaryFormatter二进制序列化包括它生成的字节中的类型信息。 This takes up additional space. 这占用了额外的空间。 It's useful in cases where you don't know what structure of data to expect at the other end, for example. 例如,在您不知道另一端需要什么样的数据结构的情况下,它非常有用。

In your case, you know what format the data has at both ends, and that doesn't sound like it'd change. 在您的情况下,您知道数据在两端的格式,并且听起来不会改变。 So you can write a simple encode and decode method. 所以你可以编写一个简单的编码和解码方法。 Your CoOrd class no longer needs to be serializable too. 您的CoOrd类不再需要可序列化。

I would use System.IO.BinaryReader and System.IO.BinaryWriter , then loop through each of your CoOrd instances and read/write the X,Y,Z propery values to the stream. 我将使用System.IO.BinaryReader和System.IO.BinaryWriter ,然后遍历每个CoOrd实例并读取/写入流的X,Y,Z属性值。 Those classes will even pack your ints into less than 11MB, assuming many of your numbers are smaller than 0x7F and 0x7FFF. 假设您的许多数字小于0x7F和0x7FFF,那些类甚至会将您的整数打包成小于11MB。

Something like this: 像这样的东西:

using (var writer = new BinaryWriter(stream)) {
    // write the number of items so we know how many to read out
    writer.Write(points.Count);
    // write three ints per point
    foreach (var point in points) {
        writer.Write(point.X);
        writer.Write(point.Y);
        writer.Write(point.Z);
    }
}

To read from the stream: 要从流中读取:

List<CoOrd> points;
using (var reader = new BinaryReader(stream)) {
    var count = reader.ReadInt32();
    points = new List<CoOrd>(count);
    for (int i = 0; i < count; i++) {
        var x = reader.ReadInt32();
        var y = reader.ReadInt32();
        var z = reader.ReadInt32();
        points.Add(new CoOrd(x, y, z));
    }
}

For simplicity of using a pre-build serializer, I recommend protobuf-net ; 为了简化使用预构建的串行器,我推荐使用protobuf-net ; here is protobuf-net v2, with just adding some attributes: 这里是protobuf-net v2,只添加了一些属性:

[DataContract]
public class Coordinates
{
    [DataContract]
    public struct CoOrd
    {
        public CoOrd(int x, int y, int z)
        {
            this.x = x;
            this.y = y;
            this.z = z;
        }
        [DataMember(Order = 1)]
        int x;
        [DataMember(Order = 2)]
        int y;
        [DataMember(Order = 3)]
        int z;
    }
    [DataMember(Order = 1)]
    public List<CoOrd> Coords = new List<CoOrd>();

    public void SetupTestArray()
    {
        Random r = new Random(123456);
        List<CoOrd> coordinates = new List<CoOrd>();
        for (int i = 0; i < 1000000; i++)
        {
            Coords.Add(new CoOrd(r.Next(10000), r.Next(10000), r.Next(10000)));
        }
    }
}

using: 使用:

ProtoBuf.Serializer.Serialize(mStream, c);

to serialize. 序列化。 This takes 10,960,823 bytes, but note that I tweaked SetupTestArray to limit the size to 10,000 since by default it uses "varint" encoding on the integers, which depends on the size. 这需要10,960,823个字节,但请注意我调整了SetupTestArray以将大小限制为10,000,因为默认情况下它对整数使用“varint”编码,这取决于大小。 10k isn't important here (in fact I didn't check what the "steps" are). 10k在这里并不重要(事实上我没有检查“步骤”是什么)。 If you prefer a fixed size (which will allow any range): 如果您更喜欢固定尺寸(允许任何范围):

        [ProtoMember(1, DataFormat = DataFormat.FixedSize)]
        int x;
        [ProtoMember(2, DataFormat = DataFormat.FixedSize)]
        int y;
        [ProtoMember(3, DataFormat = DataFormat.FixedSize)]
        int z;

Which takes 16,998,640 bytes 这需要16,998,640字节

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM