简体   繁体   中英

Serialization takes too long

I serialize all the objects contained in a list each time a new object is added (kind of an history if the application crashes).

Serialization is done within a millisecond when adding the first ~20 objects, but from now on, each newly added object is going to take longer and longer (if I add 10 more it can take 10 minutes to serialize).

As I said, it's kind of an history. So if I restart app, every object in the JSON file is added back to the list. Let's admit I had 20 objects before closing app. Now I can add 20 more objects, so it will serialize 40 within a millisecond. Once again, if I add even more objects, I'll have to close and restart app.

I don't understand why multiple serializations in the same instance of the app takes that long.

Here's the serialization code:

public static void SerializeAll()
{
    string output = JsonConvert.SerializeObject(ListOfModelToSerialize, Formatting.Indented);
    IO.StreamWriter writer = new IO.StreamWriter("A:\history.json");
    writer.Write(output);
    writer.Close();
}

Deserialization code:

public static List<ModelToSerialize> DeserializeAll()
{
    if (IO.File.Exists("A:\history.json"))
    {
        string input = IO.File.ReadAllText("A:\history.json");
        var result = JsonConvert.DeserializeObject<List<ModelToSerialize>>(input);
        return result;
    }
    else
    {
        return new List<ModelToSerialize>();
    }
}

And I only serialize 4 properties from my model. Here's the serialized model output:

{
  "an_integer": 1,
  "a_string": "...",
  "a_list_of_string": [],
  "another_list_of_string": []
}

UPDATE 1:

It seems like the integer (ID) I'm serializing is the problem. In fact, this ID is not assigned at the model creation because it changes everytime a new object is added to the list, as the list is sorted alphabetically. So to get the ID, I do this:

[JsonProperty(PropertyName = "id")]
public int Id
{
    get 
    {
        if (_id > 0) 
        {
            return _id;
        }
        else
        {
            int id = Properties.Settings.Default.PREVIOUS_MAX_ID + 1;

            foreach (File file in _directory.Files)
            {
                if (file == this) 
                {
                    return id; 
                }
                else if (file.Id == id)
                {
                    id += 1;
                }
            }

            return 0;
        }
    }
}

So the reason why it doesn't take longer to serialize the first 20 objects after the app restart is that the ID is directly assigned at deserialization.

So I have to adapt the ID retrieval.

UPDATE 2:

As the ID retrieval was the performance issue, I fixed it by assigning the ID to each object each time a new object is added. Like that, when the object ID is called, there is no more iteration in the model.

The following block is my custom Add() method for my list of objects:

public new void Add(File file)
{
    if (!base.Contains(file))
    {
        base.Add(file);
        base.Sort(new Comparer());

        for (int i = 0; i < Count; i++)
        {
            this[i].Id = Properties.Settings.Default.PREVIOUS_MAX_ID + i;
        }
    }
}

I added ~100 new objects to the current instance of the app, and serialization is not taking years anymore.

Serialization (JSON etc) is best when used to communicate a discrete bit of data from one medium to another, such as making network API calls. It can also be used to store data to disk as you are doing.

But it's not a great option when you are frequently updating that data because the most popular formats (JSON, XML) are not structured so that new data can just be appended to existing data. That's why it needs to be serialized every time. And so of course as the data gets larger serialization will take longer.

For your purpose I would suggest a database like SQLite. You could still serialize each individual row to JSON if the data doesn't lend itself well to columns, but you'll be able to append new data to the database without having to rewrite the old. And if you use transactions you can also ensure that the data keeps its integrity even if your app crashes.

For any kind of performance issue I would recommend to measure , ideally using a performance profiler that can tell you exactly what part of the code is slow.

Some general suggestions

  • You probably want to have a limit on how much history you keep. Otherwise there is a risk that your application will get slower and slower as you accumulate objects. This also lets you test worst case performance.
  • Use a faster serializer, while json.net is good, using some variant of binary serialization, like protobuf.net, typically results in less data and time ( benchmark ), at the cost of not being humanly readable. You might also consider a fast compressor like lz4 to reduce the file size and therefore disk-performance.
  • Change your model so you do not have to overwrite all data. Either serialize to different files, or use some serialization method that lets you append messages to the same file.
  • Limit save-frequency. You might not need to save history for each and every change, it might be sufficient to save every 5s or so.
  • If you have not already done so, ensure that at least saving data is done asynchronously so it does not block the UI thread. This might require some way to access your data in a thread safe manner so ensure it cannot change while it is being serialized.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM