简体   繁体   English

从磁盘加载大型json文件时出现内存不足异常

[英]Out of memory exception while loading large json file from disk

I have a 1.2 GB json file which when deserialized ought to give me a list with 15 mil objects. 我有一个1.2 GB的json文件,在反序列化时应该给我一个包含15 mil对象的列表。

The machine on which I'm trying to deserialize the same is a windows 2012 server(64 bit) with 16 core and 32 GB Ram. 我正在尝试对其进行反序列化的机器是具有16核和32 GB Ram的Windows 2012服务器(64位)。

The application has been built with target of x64. 该应用程序已构建为x64的目标。

Inspite of this when I try to read the json doc and convert it to list of objects I'm getting Out of memory exception. 尽管如此,当我尝试读取json doc并将其转换为对象列表时,我将失去内存异常。 when I look at task manager I find that only 5GB memory has been used. 当我看到任务管理器时,我发现只使用了5GB的内存。

The codes i tried are as below.. 我试过的代码如下。

a. 一种。

 string plays_json = File.ReadAllText("D:\\Hun\\enplays.json");

                plays = JsonConvert.DeserializeObject<List<playdata>>(plays_json);

b.

 string plays_json = "";
        using (var reader = new StreamReader("D:\\Hun\\enplays.json"))
        {
            plays_json = reader.ReadToEnd();
            plays = JsonConvert.DeserializeObject<List<playdata>>(plays_json);
        }

c. C。

 using (StreamReader sr = File.OpenText("D:\\Hun\\enplays.json"))
        {
            StringBuilder sb = new StringBuilder();
            sb.Append(sr.ReadToEnd());
            plays_json = sb.ToString();
            plays = JsonConvert.DeserializeObject<List<playdata>>(plays_json);
        }

All help is sincerely appreciated 所有帮助都是真诚的感谢

The problem is that you are reading your entire huge file into memory and then trying to deserialize it all at once into a huge list. 问题是你正在将整个巨大的文件读入内存,然后尝试将其全部反序列化为一个巨大的列表。 You should be using a StreamReader to process your file incrementally. 您应该使用StreamReader以递增方式处理文件。 Example (b) in your question doesn't cut it, even though you are using a StreamReader there, because you are still reading the entire file via ReadToEnd() . 您的问题中的示例(b)不会删除它,即使您在那里使用StreamReader,因为您仍在通过ReadToEnd()读取整个文件。 You should be doing something like this instead: 你应该做这样的事情:

using (StreamReader sr = new StreamReader("D:\\Hun\\enplays.json"))
using (JsonTextReader reader = new JsonTextReader(sr))
{
    var serializer = new JsonSerializer();

    while (reader.Read())
    {
        if (reader.TokenType == JsonToken.StartObject)
        {
            // Deserialize each object from the stream individually and process it
            var playdata = serializer.Deserialize<playdata>(reader);

            ProcessPlayData(playdata);
        }
    }
}

The ProcessPlayData method should process a single playdata object and then ideally write the result to a file or a database rather than an in-memory list (otherwise you may find yourself back in the same situation again). ProcessPlayData方法应该处理单个playdata对象,然后理想地将结果写入文件或数据库而不是内存列表(否则您可能会再次发现自己处于相同的情况)。 If you must store the results of processing each item into an in-memory list, then you might want to consider using a linked list or a similar structure that does not try to allocate memory in one contiguous block and does not need to reallocate and copy when it needs to expand. 如果必须将处理每个项目的结果存储到内存列表中,那么您可能需要考虑使用链接列表或类似结构,该结构不会尝试在一个连续块中分配内存,并且不需要重新分配和复制何时需要扩展。

In my opinion, your out of memory exception can be due to one of the below reasons. 在我看来,您的内存不足异常可能是由于以下原因之一。

The size of your object plays is over 2GB, and by default the maximum size of a CLR object in .NET is 2GB(even on x64) See here 对象plays的大小超过2GB,默认情况下,.NET中CLR对象的最大大小为2GB(即使在x64上) 请参见此处

Now, your object doesn't have to be 2GB. 现在,您的对象不必是2GB。 Fragmentation in Large Object Heap (LOH) can cause an object smaller than 2GB to throw an out of memory exception as well. 大对象堆(LOH)中的碎片可能导致小于2GB的对象也会引发内存不足异常。 (Any object over 80kb or so will reside in Large object heap) (任何超过80kb的对象都将驻留在大对象堆中)

Another case is when OS can't allocate a contiguous block of virtual memory for your large object, but I don't think this would be the case since you've mentioned you have 32GB RAM. 另一种情况是操作系统无法为大型对象分配连续的虚拟内存块,但我不认为这是因为你提到你有32GB内存。

I wouldn't just go and enable gcAllowVeryLargeObjects unless there aren't any other options. 除非没有任何其他选项,否则我不会去启用gcAllowVeryLargeObjects。 I have seen memory consumption of one of my large data handling Apis go up from 3GB to 8GB after turning on that setting. 我看到在打开该设置后,我的一个大数据处理Apis的内存消耗从3GB上升到8GB。 (Even though most of it were only reserved) I think this is because you are allowing your app to ask the OS for as much memory as it needs to hold a large object. (尽管大部分内容仅保留)我认为这是因为您允许您的应用程序向操作系统询问存储大型对象所需的内存量。 This can be particularly problematic if you are hosting other apps on the same server. 如果您在同一台服务器上托管其他应用程序,这可能会特别成问题。 It's good to have an upper limit on how much memory a managed object can take. 对托管对象可以占用多少内存有一个上限是好的。

Another thing to note is that by default GC doesn't compact the LOH. 另外需要注意的是,默认情况下GC不会压缩LOH。 So, this means the size of the working set will remain large unless a full garbage collection takes place. 因此,这意味着工作集的大小将保持很大,除非进行完整的垃圾收集。 (You can call GC to compact LOH from .NET 4.5.1 onwards) See here (您可以从.NET 4.5.1开始调用GC来压缩LOH) 请参阅此处

I would strongly recommend using a memory profiler like dotMemory to first understand what's going on under the hood before making any decisions. 我强烈建议使用像dotMemory这样的内存分析器,在做出任何决定之前先了解一下发生了什么。

If you are targeting x64 and if this is a web application, then make sure IIS is set to use 64-bit version as well. 如果您的目标是x64,并且这是一个Web应用程序,那么请确保IIS也设置为使用64位版本。 See here for local IIS express and IIS on Server 请参阅此处了解本地IIS Express服务器上的IIS

If I were you, I would try to break this task into smaller batches. 如果我是你,我会尝试将这项任务分成小批量。

What is the purpose of loading this entire file at one go? 一次加载整个文件的目的是什么? Are you trying to do some IO operation with the loaded data or any CPU intensive tasks? 您是否尝试使用加载的数据或任何CPU密集型任务执行某些IO操作?

Here is a useful link on GC fundamentals 这是GC基础知识的有用链接

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将大文件写入磁盘内存不足异常 - Writing Large File To Disk Out Of Memory Exception 从隔离存储加载图像时出现内存不足异常 - Out of memory exception while loading images from isolated storage 将图像加载到 BitMap 时出现 Memory 异常 - Out of Memory exception while loading image to BitMap 如何在不使用磁盘且内存不足的情况下将大型文件从api流式传输到api? - How do I stream a large file from api to api without using disk and running out of memory? 内存不足异常读取“大”文件 - Out of memory exception reading “Large” file 将大对象作为 json 文件而不是内存 .NET 写入磁盘 - Writing large object to disk as json file instead of memory .NET 处理大文件时C#NAudio内存不足异常 - C# NAudio Out of Memory Exception while working with large files 加载巨大的DBpedia转储时出现内存不足 - out of memory exception while loading a huge DBpedia dump 通过OleDBDataReader从大型Excel文件进​​行C#批量复制,抛出内存不足异常 - C# Bulk Copy via OleDBDataReader from large excel file throwing out of memory exception 试图将大量数据写入文件的内存不足异常 - Out of memory exception trying to write large amount of data to a file
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM