简体   繁体   English

优化.NET POCO的JSON序列化性能

[英]Optimizing JSON Serialization Performance of .NET POCOs

I have been trying to optimize the JSON Serialization of over 500K POCO's to be importd into a MongoDB, and been running into nothing but headaches. 我一直在尝试优化将超过500K POCO的JSON序列化导入到MongoDB中的过程,除了头痛之外,什么都没有。 I originally tried the Newtonsoft Json.Convert() function but that was taking too long. 我最初尝试使用Newtonsoft Json.Convert()函数,但这花了太长时间。 Then, based on the advice of several posts here on SO, Newtonsoft's own site, and other locations I have attempted to manually serialize the objects. 然后,基于SO,Newtonsoft自己的站点以及其他位置上的几篇文章的建议,我试图手动序列化对象。 But have not noticed much, if any performance gain. 但是并没有注意到太多,如果有任何性能提升。

This is the code that I use to kick off the serialization process ... Above each line, in the comments, is the amount of time that each individual operation took to complete, given a dataset of 1000 objects. 这是我用来启动序列化过程的代码。在注释的每一行上方,是给定1000个对象的数据集,每个单独操作完成所花费的时间。

//
// Get reference to the MongoDB Collection
var collection = _database.GetCollection<BsonDocument>("sessions");
//
// 8ms - Get the number of records already in the MongoDB. We will skip this many when retrieving more records from the RDBMS
Int32 skipCount = collection.AsQueryable().Count();
//
// 74ms - Get the records as POCO's that will be imported into the MongoDB (using Telerik OpenAcces ORM)
List<Session> sessions = uow.DbContext.Sessions.Skip(skipCount).Take(1000).ToList();

//
// The duration times displayed in the foreach loop are the cumulation of the time spent on 
// ALL the items and not just a single one.
foreach (Session item in sessions)
{
    StringWriter sw       = new StringWriter();         
    JsonTextWriter writer = new JsonTextWriter(sw);     
    //
    // 585,934ms (yes - 9.75 MINUTES) - Serialization of 1000 POCOs into a JSON string. Total duration of ALL 1000 objects 
    item.ToJSON(ref writer);
    //
    // 16ms - Parse the StringWriter into a String. Total duration of ALL 1000 objects.
    String json = sw.ToString();
    //
    // 376ms - Deserialize the json into MongoDB BsonDocument instances. Total duration of ALL 1000 objects.
    BsonDocument doc = MongoDB.Bson.Serialization.BsonSerializer.Deserialize<BsonDocument>(json); // 376ms

    //
    // 8ms - Insert the BsonDocument into the MongoDB dataStore. Total duration of ALL 1000 objects.
    collection.InsertOne(doc);

}

Currently these take about .5 - .75 sec for each individual object to be serialized to a JSON document ... which equals about 10 minutes for 1000 documents ... 100 minutes for 10,000 documents, etc. I find that the durations are fairly consistent, but ultimately this means that in order to load the 600K records it will take about 125 straight hours of processing to perform the dataload. 当前,将每个对象序列化为JSON文档大约需要0.5-.75秒...对于1000个文档来说大约等于10分钟...对于10000个文档来说大约等于100分钟,等等。我发现持续时间相当长一致,但这最终意味着要加载600K记录,将需要大约125个连续小时的处理时间来执行数据加载。 This is for a messaging system that could eventually be adding 20K - 100K new documents per day so performance is a REAL issue for us. 这是针对最终每天可能增加20K-100K新文档的邮件系统,因此性能对我们来说是一个真正的问题。

The object(s) I am serializing contain a couple of layers of "navigation" properties or "nested documents" (depending on whether you view them through an ORM or MongoDB lens) but is not otherwise particularly complex or noteworthy. 我要序列化的对象包含两层“导航”属性或“嵌套文档”(取决于您是通过ORM还是MongoDB的角度查看它们),但是它们并不是特别复杂或值得注意。

The serialization code I constructed passes the JsonTextWriter instance created in the previous code sample, into the ToJSON functions of the POCOs, so we are not creating new writers for each model to use when serializing itself. 我构建的序列化代码将在上一个代码示例中创建的JsonTextWriter实例传递到POCO的ToJSON函数中,因此我们不会为每个模型在序列化自身时创建新的编写器。

The following code is a truncated example of a few of the objects in an attempt to illustrate the implementation technique (how the writer is being passed and how the JSON is being manually constructed). 以下代码是一些对象的简化示例,试图说明实现技术(如何传递编写器以及如何手动构造JSON)。 There are many more properties and a few more related/nested objects but this is an example of the "deepest" traversal I have to make. 有更多的属性和更多的相关/嵌套对象,但这是我必须进行的“最深”遍历的一个示例。

It begins with the "Session" object and recursively calls it's dependent properties to also serialize themselves. 它以“ Session”对象开始,并递归调用其依赖属性来序列化自身。

public class Session
{

    #region properties

    public Guid SessionUID { get; set; }

    public String AssetNumber { get; set; }

    public Int64? UTCOffset { get; set; }

    public DateTime? StartUTCTimestamp { get; set; }

    public DateTime? StartTimestamp { get; set; }

    public DateTime? EndTimestamp { get; set; }

    public String Language { get; set; }

    // ... many more properties 

    #endregion properties 

    #region navigation properties

    public virtual IList<SessionItem> Items { get; set; }

    #endregion navigation properties

    #region methods
    public void ToJSON(ref JsonTextWriter writer)
    {
        Session session = this;     
        // {
        writer.WriteStartObject();

        writer.WritePropertyName("SessionUID");
        writer.WriteValue(session.SessionUID);

        writer.WritePropertyName("AssetNumber");
        writer.WriteValue(session.AssetNumber);

        writer.WritePropertyName("UTCOffset");
        writer.WriteValue(session.UTCOffset);

        writer.WritePropertyName("StartUTCTimestamp");
        writer.WriteValue(session.StartUTCTimestamp);

        writer.WritePropertyName("StartTimestamp");
        writer.WriteValue(session.StartTimestamp);

        writer.WritePropertyName("EndTimestamp");
        writer.WriteValue(session.EndTimestamp);

        writer.WritePropertyName("Language");
        writer.WriteValue(session.Language);

        // continues adding remaining instance properties

        #endregion write out the properties

        #region include the navigation properties

        // "Items": [ {}, {}, {} ]
        writer.WritePropertyName("Items");
        writer.WriteStartArray();
        foreach (SessionItem item in this.Items)
        {
            item.ToJSON(ref writer);
        }
        writer.WriteEndArray();

        #endregion include the navigation properties

        // }
        writer.WriteEndObject();
        //return sw.ToString();
    }

    #endregion methods 
}

public class SessionItem
{
    #region properties

    public Int64 ID { get; set; }

    public Int64 SessionID { get; set; }

    public Int32 Quantity { get; set; }

    public Decimal UnitPrice { get; set; }

    #endregion properties

    #region navigation properties

    public virtual Session Session { get; set; }

    public virtual IList<SessionItemAttribute> Attributes { get; set; }

    #endregion navigation properties

    #region public methods
    public void ToJSON(ref JsonTextWriter writer)
    {
        // {
        writer.WriteStartObject();

        #region write out the properties

        writer.WritePropertyName("ID");
        writer.WriteValue(this.ID);

        writer.WritePropertyName("SessionID");
        writer.WriteValue(this.SessionID);

        writer.WritePropertyName("Quantity");
        writer.WriteValue(this.Quantity);

        writer.WritePropertyName("UnitPrice");
        writer.WriteValue(this.UnitPrice);

        #endregion write out the properties

        #region include the navigation properties
        //
        // "Attributes": [ {}, {}, {} ]
        writer.WritePropertyName("Attributes");
        writer.WriteStartArray();
        foreach (SessionItemAttribute item in this.Attributes)
        {
            item.ToJSON(ref writer);
        }
        writer.WriteEndArray();

        #endregion include the navigation properties

        // }
        writer.WriteEndObject();
        //return sw.ToString();
    }
    #endregion public methods
}

public class SessionItemAttribute : BModelBase, ISingleID
{
    public Int64 ID { get; set; }

    public String Name { get; set; }

    public String Datatype { get; set; }

    public String Value { get; set; }

    #region navigation properties

    public Int64 ItemID { get; set; }
    public virtual SessionItem Item { get; set; }

    public Int64 ItemAttributeID { get; set; }
    public virtual ItemAttribute ItemAttribute { get; set; }

    #endregion navigation properties

    #region public methods
    public void ToJSON(ref JsonTextWriter writer)
    {
        // {
        writer.WriteStartObject();

        #region write out the properties

        writer.WritePropertyName("ID");
        writer.WriteValue(this.ID);

        writer.WritePropertyName("Name");
        writer.WriteValue(this.Name);

        writer.WritePropertyName("Datatype");
        writer.WriteValue(this.Datatype);

        writer.WritePropertyName("StringValue");
        writer.WriteValue(this.StringValue);

        writer.WritePropertyName("NumberValue");
        writer.WriteValue(this.NumberValue);

        writer.WritePropertyName("DateValue");
        writer.WriteValue(this.DateValue);

        writer.WritePropertyName("BooleanValue");
        writer.WriteValue(this.BooleanValue);

        writer.WritePropertyName("ItemID");
        writer.WriteValue(this.ItemID);

        writer.WritePropertyName("ItemAttributeID");
        writer.WriteValue(this.ItemAttributeID);

        #endregion write out the properties

        // }
        writer.WriteEndObject();
        //return sw.ToString();
    }
    #endregion public methods
}

I suspect that I am overlooking something or that the problem lies in the manner in which I am implementing the serialization. 我怀疑我忽略了某些东西,或者问题出在实现序列化的方式上。 One SO poster claimed to have reduced his load time from 28 seconds to 31 milliseconds by manually serializing the data so I was expecting somewhat more dramatic results. 一位SO张贴者声称通过手动序列化数据将其加载时间从28秒减少到31 毫秒 ,因此我期望结果会更加生动。 In fact, this is nearly the exact same performance I observed using Newtonsoft Json.Convert() method. 实际上,这几乎与使用Newtonsoft Json.Convert()方法观察到的性能完全相同。

Any help diagnosing the source of latency in the serialization would be most appreciated. 任何有助于诊断序列化中延迟源的帮助将不胜感激。 Thank you! 谢谢!

UPDATE 更新

While I have not extricated the data access from the ORM yet I have been able to confirm that the latency is actually coming from the ORM (thank you commenters). 尽管我还没有从ORM提取数据访问权限,但是我已经能够确认延迟实际上来自ORM(谢谢评论者)。 When I added the FetchStrategy as suggested the latency was still there but the time moved from being spent on Serialization to being spent on the query (ie the loading of the navigation properties). 当我按照建议添加FetchStrategy时,延迟仍然存在,但是时间从花在序列化上到花在查询上(即导航属性的加载)。

So the issue isn't serialization as much as it is optimizing the data retrieval. 因此,问题不在于串行化,而在于优化数据检索。

In an effort to to provide closure to this question I wanted to post my solution. 为了解决这个问题,我想发布我的解决方案。

After further research, the commentors on the original post had it correct. 经过进一步研究,原始帖子的评论者认为它是正确的。 This was not a serialization issue but a data access issue. 这不是序列化问题,而是数据访问问题。 The ORM was "lazily loading" navigation properties as they were being requested during the serialization process. 在序列化过程中,ORM正在“延迟加载”导航属性。 When I implemented the FetchStrategy to "greedily" fetch the associated objects the source of the latency shifted from the counters I had in place around the serialization process to the counters I placed around data access. 当我实现FetchStrategy来“贪婪地”获取关联的对象时,延迟源从我在序列化过程中就位的计数器转移到了我在数据访问中就位的计数器。

I was able to resolve this by adding indexes on the foreign key fields in the database. 我能够通过在数据库的外键字段上添加索引来解决此问题。 Latency dropped by over 90% and what was taking 100+ minutes to run is now being completed in 10. 延迟下降了90%以上,耗时100分钟以上的运行时间现在已在10分钟内完成。

So thanks to the folks who commented and helped remove my blinders by reminding me of what else was going on. 因此,感谢那些发表评论并通过提醒我还有其他事情来帮助我消除盲人的人们。

Here's a benchmark comparison chart of different JSON serializer. 这是不同JSON序列化程序的基准比较表。 Try ProtoBuf-net or NetJson which highest ranked candidate faster serialization for simle POCOs. 尝试使用ProtoBuf-netNetJson ,它们是同类POCO排名最高的候选者,可以更快地序列化。

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM