简体   繁体   English

C#-OutOfMemoryException将列表保存在JSON文件中

[英]C# - OutOfMemoryException saving a List on a JSON file

I'm trying to save the streaming data of a pressure map. 我正在尝试保存压力图的流数据。 Basically I have a pressure matrix defined as: 基本上,我的压力矩阵定义为:

double[,] pressureMatrix = new double[e.Data.GetLength(0), e.Data.GetLength(1)];

Basically, I'm getting one of this pressureMatrix every 10 milliseconds and I want to save all the information in a JSON file to be able to reproduce it later. 基本上,我每10毫秒获得一个pressureMatrix并且我想将所有信息保存在JSON文件中以便以后重现。

What I do is, first of all, write what I call the header with all the settings used to do the recording like this: 我要做的是,首先,使用用于进行录制的所有设置来编写我称为标头的内容,如下所示:

recordedData.softwareVersion = Assembly.GetExecutingAssembly().GetName().Version.Major.ToString() + "." + Assembly.GetExecutingAssembly().GetName().Version.Minor.ToString();
recordedData.calibrationConfiguration = calibrationConfiguration;
recordedData.representationConfiguration = representationSettings;
recordedData.pressureData = new List<PressureMap>();

var json = JsonConvert.SerializeObject(csvRecordedData, Formatting.None);

File.WriteAllText(this.filePath, json);

Then, every time I get a new pressure map I create a new Thread to add the new PressureMatrix and re-write the file: 然后,每次获得新的压力图时,我都会创建一个新的线程来添加新的PressureMatrix并重写该文件:

var newPressureMatrix = new PressureMap(datos, DateTime.Now);
recordedData.pressureData.Add(newPressureMatrix);
var json = JsonConvert.SerializeObject(recordedData, Formatting.None);
File.WriteAllText(this.filePath, json);

After about 20-30 min I get an OutOfMemory Exception because the system cannot hold the recordedData var because the List<PressureMatrix> in it is too big. 大约20-30分钟后,我收到OutOfMemory异常,因为系统无法保存recordedData变量,因为其中的List<PressureMatrix>太大。

How can I handle this to save a the data? 我该如何处理以保存数据? I would like to save the information of 24-48 hours. 我想保存24-48小时的信息。

Your basic problem is that you are holding all of your pressure map samples in memory rather than writing each one individually and then allowing it to be garbage collected. 您的基本问题是,您要将所有压力图样本保存在内存中,而不是分别编写每个样本,然后将其垃圾回收。 What's worse, you are doing this in two different places: 更糟糕的是,您在两个不同的地方这样做:

  1. You serialize your entire list of samples to a JSON string json before writing the string to a file. 您可以将整个样本列表序列化为JSON字符串json然后再将字符串写入文件。

    Instead, as explained in Performance Tips: Optimize Memory Usage , you should serialize and deserialize directly to and from your file in such situations. 而是,如《 性能提示:优化内存使用》中所述 ,在这种情况下,应直接对文件进行序列化和反序列化。 For instructions on how to do this see this answer to Can Json.NET serialize / deserialize to / from a stream? 有关如何执行此操作的说明,请参见对Json.NET可以序列化/反序列化到流或从流反序列化的 答案 and also Serialize JSON to a file . 还将JSON序列化为文件

  2. The recordedData.pressureData = new List<PressureMap>(); recordedData.pressureData = new List<PressureMap>(); accumulates all pressure map samples, then writes all of them every time a sample is made. 累积所有压力图样本,然后在每次创建样本时将其全部写入。

    A better solution would be to write each sample once and forget it, but the requirement for each sample to be nested inside some container objects in the JSON makes it nonobvious how to do that. 更好的解决方案是将每个样本编写一次,然后将其忘记,但是每个样本都必须嵌套在JSON中的某些容器对象内,这使得如何做到这一点变得不明显。

So, how to attack issue #2? 那么,如何解决问题2?

First, let's modify your data model as follows, partitioning the header data into a separate class: 首先,让我们如下修改数据模型,将标题数据划分为一个单独的类:

public class PressureMap
{
    public double[,] PressureMatrix { get; set; }
}

public class CalibrationConfiguration 
{
    // Data model not included in question
}

public class RepresentationConfiguration 
{
    // Data model not included in question
}

public class RecordedDataHeader
{
    public string SoftwareVersion { get; set; }
    public CalibrationConfiguration CalibrationConfiguration { get; set; }
    public RepresentationConfiguration RepresentationConfiguration { get; set; }
}

public class RecordedData
{
    // Ensure the header is serialized first.
    [JsonProperty(Order = 1)]
    public RecordedDataHeader RecordedDataHeader { get; set; }
    // Ensure the pressure data is serialized last.
    [JsonProperty(Order = 2)]
    public IEnumerable<PressureMap> PressureData { get; set; }
}

Option #1 is a version of the producer-comsumer pattern . 选项#1生产者-消费者模式的版本。 It involves spinning up two threads: one to generate PressureData samples, and one to serialize the RecordedData . 它涉及两个线程:一个线程生成PressureData样本,另一个线程序列化RecordedData The first thread will generate samples and add them to a BlockingCollection<PressureMap> collection that is passed to the second thread. 第一个线程将生成样本,并将其添加到传递给第二个线程的BlockingCollection<PressureMap>集合中。 The second thread will then serialize BlockingCollection<PressureMap>.GetConsumingEnumerable() as the value of RecordedData.PressureData . 然后,第二个线程将BlockingCollection<PressureMap>.GetConsumingEnumerable()序列化为RecordedData.PressureData的值。

The following code gives a skeleton for how to do this: 以下代码提供了执行此操作的框架:

var sampleCount = 400;    // Or whatever stopping criterion you prefer
var sampleInterval = 10;  // in ms

using (var pressureData = new BlockingCollection<PressureMap>())
{
    // Adapted from
    // https://docs.microsoft.com/en-us/dotnet/standard/collections/thread-safe/blockingcollection-overview
    // https://docs.microsoft.com/en-us/dotnet/api/system.collections.concurrent.blockingcollection-1?view=netframework-4.7.2

    // Spin up a Task to sample the pressure maps
    using (Task t1 = Task.Factory.StartNew(() =>
    {
        for (int i = 0; i < sampleCount; i++)
        {
            var data = GetPressureMap(i);
            Console.WriteLine("Generated sample {0}", i);
            pressureData.Add(data);
            System.Threading.Thread.Sleep(sampleInterval);
        }
        pressureData.CompleteAdding();
    }))
    {
        // Spin up a Task to consume the BlockingCollection
        using (Task t2 = Task.Factory.StartNew(() =>
        {
            var recordedDataHeader = new RecordedDataHeader
            {
                SoftwareVersion = softwareVersion,
                CalibrationConfiguration = calibrationConfiguration,
                RepresentationConfiguration = representationConfiguration,
            };

            var settings = new JsonSerializerSettings
            {
                ContractResolver = new CamelCasePropertyNamesContractResolver(),
            };

            using (var stream = new FileStream(this.filePath, FileMode.Create))
            using (var textWriter = new StreamWriter(stream))
            using (var jsonWriter = new JsonTextWriter(textWriter))
            {
                int j = 0;

                var query = pressureData
                    .GetConsumingEnumerable()
                    .Select(p => 
                            { 
                                // Flush the writer periodically in case the process terminates abnormally
                                jsonWriter.Flush();
                                Console.WriteLine("Serializing item {0}", j++);
                                return p;
                            });

                var recordedData = new RecordedData
                {
                    RecordedDataHeader = recordedDataHeader,
                    // Since PressureData is declared as IEnumerable<PressureMap>, evaluation will be lazy.
                    PressureData = query,
                };                          

                Console.WriteLine("Beginning serialization of {0} to {1}:", recordedData, this.filePath);
                JsonSerializer.CreateDefault(settings).Serialize(textWriter, recordedData);
                Console.WriteLine("Finished serialization of {0} to {1}.", recordedData, this.filePath);
            }
        }))
        {
            Task.WaitAll(t1, t2);
        }
    }
}

Notes: 笔记:

  • This solution uses the fact that, when serializing an IEnumerable<T> , Json.NET will not materialize the enumerable as a list. 该解决方案利用了以下事实:在序列化IEnumerable<T> ,Json.NET 不会将可枚举实现为列表。 Instead it will take full advantage of lazy evaluation and simply enumerate through it, writing then forgetting each individual item encountered. 取而代之的是,它将充分利用惰性评估,并简单地枚举它,编写然后忘记遇到的每个项目。

  • The first thread samples PressureData and adds them to the blocking collection. 第一个线程对PressureData采样,并将它们添加到阻塞集合中。

  • The second thread wraps the blocking collection in an IEnumerable<PressureData> then serializes that as RecordedData.PressureData . 第二个线程将阻塞集合包装在IEnumerable<PressureData>然后将其序列化为RecordedData.PressureData

    During serialization, the serializer will enumerate through the IEnumerable<PressureData> enumerable, streaming each to the JSON file then proceeding to the next -- effectively blocking until one becomes available. 在序列化期间,序列化程序将枚举IEnumerable<PressureData>枚举,将每个流传输到JSON文件,然后继续进行下一个-有效地阻塞直到一个可用。

  • You will need to do some experimentation to make sure that the serialization thread can "keep up" with the sampling thread, possibly by setting a BoundedCapacity during construction. 您将需要进行一些实验,以确保序列化线程可以“跟上”采样线程,这可能是通过在构造过程中设置BoundedCapacity If not, you may need to adopt a different strategy. 如果没有,您可能需要采用其他策略。

  • PressureMap GetPressureMap(int count) should be some method of yours (not shown in the question) that returns the current pressure map sample. PressureMap GetPressureMap(int count)应该是您的某种方法(问题中未显示),该方法返回当前压力图样本。

  • In this technique the JSON file remains open for the duration of the sampling session. 在此技术中,JSON文件在采样会话期间保持打开状态。 If sampling terminates abnormally the file may be truncated. 如果采样异常终止,则文件可能会被截断。 I make some attempt to ameliorate the problem by flushing the writer periodically. 我尝试通过定期刷新编写器来缓解此问题。

  • While data serialization will no longer require unbounded amounts of memory, deserializing a RecordedData later will deserialize the PressureData array into a concrete List<PressureMap> . 尽管数据序列化将不再需要无限制的内存量,但稍后反序列化RecordedData会将反序列化PressureData组成具体的List<PressureMap> This may possibly cause memory issues during downstream processing. 这可能在下游处理期间导致内存问题。

Demo fiddle #1 here . 演示小提琴#1 在这里

Option #2 would be to switch from a JSON file to a Newline Delimited JSON file. 选项#2是从JSON文件切换到以换行符分隔的JSON文件。 Such a file consists of sequences of JSON objects separated by newline characters. 这样的文件由用换行符分隔的JSON对象序列组成。 In your case, you would make the first object contain the RecordedDataHeader information, and the subsequent objects be of type PressureMap : 在您的情况下,您将使第一个对象包含RecordedDataHeader信息,而随后的对象将成为PressureMap类型:

var sampleCount = 100; // Or whatever
var sampleInterval = 10;

var recordedDataHeader = new RecordedDataHeader
{
    SoftwareVersion = softwareVersion,
    CalibrationConfiguration = calibrationConfiguration,
    RepresentationConfiguration = representationConfiguration,
};

var settings = new JsonSerializerSettings
{
    ContractResolver = new CamelCasePropertyNamesContractResolver(),
};

// Write the header
Console.WriteLine("Beginning serialization of sample data to {0}.", this.filePath);

using (var stream = new FileStream(this.filePath, FileMode.Create))
{
    JsonExtensions.ToNewlineDelimitedJson(stream, new[] { recordedDataHeader });
}

// Write each sample incrementally

for (int i = 0; i < sampleCount; i++)
{
    Thread.Sleep(sampleInterval);
    Console.WriteLine("Performing sample {0} of {1}", i, sampleCount);
    var map = GetPressureMap(i);

    using (var stream = new FileStream(this.filePath, FileMode.Append))
    {
        JsonExtensions.ToNewlineDelimitedJson(stream, new[] { map });
    }
}

Console.WriteLine("Finished serialization of sample data to {0}.", this.filePath);

Using the extension methods: 使用扩展方法:

public static partial class JsonExtensions
{
    // Adapted from the answer to
    // https://stackoverflow.com/questions/44787652/serialize-as-ndjson-using-json-net
    // by dbc https://stackoverflow.com/users/3744182/dbc
    public static void ToNewlineDelimitedJson<T>(Stream stream, IEnumerable<T> items)
    {
        // Let caller dispose the underlying stream 
        using (var textWriter = new StreamWriter(stream, new UTF8Encoding(false, true), 1024, true))
        {
            ToNewlineDelimitedJson(textWriter, items);
        }
    }

    public static void ToNewlineDelimitedJson<T>(TextWriter textWriter, IEnumerable<T> items)
    {
        var serializer = JsonSerializer.CreateDefault();

        foreach (var item in items)
        {
            // Formatting.None is the default; I set it here for clarity.
            using (var writer = new JsonTextWriter(textWriter) { Formatting = Formatting.None, CloseOutput = false })
            {
                serializer.Serialize(writer, item);
            }
            // http://specs.okfnlabs.org/ndjson/
            // Each JSON text MUST conform to the [RFC7159] standard and MUST be written to the stream followed by the newline character \n (0x0A). 
            // The newline charater MAY be preceeded by a carriage return \r (0x0D). The JSON texts MUST NOT contain newlines or carriage returns.
            textWriter.Write("\n");
        }
    }

    // Adapted from the answer to 
    // https://stackoverflow.com/questions/29729063/line-delimited-json-serializing-and-de-serializing
    // by Yuval Itzchakov https://stackoverflow.com/users/1870803/yuval-itzchakov
    public static IEnumerable<TBase> FromNewlineDelimitedJson<TBase, THeader, TRow>(TextReader reader)
        where THeader : TBase
        where TRow : TBase
    {
        bool first = true;

        using (var jsonReader = new JsonTextReader(reader) { CloseInput = false, SupportMultipleContent = true })
        {
            var serializer = JsonSerializer.CreateDefault();

            while (jsonReader.Read())
            {
                if (jsonReader.TokenType == JsonToken.Comment)
                    continue;
                if (first)
                {
                    yield return serializer.Deserialize<THeader>(jsonReader);
                    first = false;
                }
                else
                {
                    yield return serializer.Deserialize<TRow>(jsonReader);
                }
            }
        }
    }
}

Later, you can process the newline delimited JSON file as follows: 稍后,您可以按以下方式处理换行符分隔的JSON文件:

using (var stream = File.OpenRead(filePath))
using (var textReader = new StreamReader(stream))
{
    foreach (var obj in JsonExtensions.FromNewlineDelimitedJson<object, RecordedDataHeader, PressureMap>(textReader))
    {
        if (obj is RecordedDataHeader)
        {
            var header = (RecordedDataHeader)obj;
            // Process the header
            Console.WriteLine(JsonConvert.SerializeObject(header));
        }
        else
        {
            var row = (PressureMap)obj;
            // Process the row.
            Console.WriteLine(JsonConvert.SerializeObject(row));
        }
    }
}

Notes: 笔记:

  • This approach looks simpler because the samples are added incrementally to the end of the file, rather than inserted inside some overall JSON container. 这种方法看起来更简单,因为样本是递增地添加到文件末尾的,而不是插入到整个JSON容器中。

  • With this approach both serialization and downstream processing can be done with bounded memory use. 使用这种方法,可以使用有限的内存来完成序列化和下游处理。

  • The sample file does not remain open for the duration of sampling, so is less likely to be truncated. 样本文件在采样期间不会保持打开状态,因此被截断的可能性较小。

  • Downstream applications may not have built-in tools for processing newline delimited JSON. 下游应用程序可能没有内置工具来处理换行符分隔的JSON。

  • This strategy may integrate more simply with your current threading code. 该策略可以更简单地与您当前的线程代码集成。

Demo fiddle #2 here . 演示小提琴#2 在这里

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM