简体   繁体   中英

Out of memory exception while loading large json file from disk

I have a 1.2 GB json file which when deserialized ought to give me a list with 15 mil objects.

The machine on which I'm trying to deserialize the same is a windows 2012 server(64 bit) with 16 core and 32 GB Ram.

The application has been built with target of x64.

Inspite of this when I try to read the json doc and convert it to list of objects I'm getting Out of memory exception. when I look at task manager I find that only 5GB memory has been used.

The codes i tried are as below..

a.

 string plays_json = File.ReadAllText("D:\\Hun\\enplays.json");

                plays = JsonConvert.DeserializeObject<List<playdata>>(plays_json);

b.

 string plays_json = "";
        using (var reader = new StreamReader("D:\\Hun\\enplays.json"))
        {
            plays_json = reader.ReadToEnd();
            plays = JsonConvert.DeserializeObject<List<playdata>>(plays_json);
        }

c.

 using (StreamReader sr = File.OpenText("D:\\Hun\\enplays.json"))
        {
            StringBuilder sb = new StringBuilder();
            sb.Append(sr.ReadToEnd());
            plays_json = sb.ToString();
            plays = JsonConvert.DeserializeObject<List<playdata>>(plays_json);
        }

All help is sincerely appreciated

The problem is that you are reading your entire huge file into memory and then trying to deserialize it all at once into a huge list. You should be using a StreamReader to process your file incrementally. Example (b) in your question doesn't cut it, even though you are using a StreamReader there, because you are still reading the entire file via ReadToEnd() . You should be doing something like this instead:

using (StreamReader sr = new StreamReader("D:\\Hun\\enplays.json"))
using (JsonTextReader reader = new JsonTextReader(sr))
{
    var serializer = new JsonSerializer();

    while (reader.Read())
    {
        if (reader.TokenType == JsonToken.StartObject)
        {
            // Deserialize each object from the stream individually and process it
            var playdata = serializer.Deserialize<playdata>(reader);

            ProcessPlayData(playdata);
        }
    }
}

The ProcessPlayData method should process a single playdata object and then ideally write the result to a file or a database rather than an in-memory list (otherwise you may find yourself back in the same situation again). If you must store the results of processing each item into an in-memory list, then you might want to consider using a linked list or a similar structure that does not try to allocate memory in one contiguous block and does not need to reallocate and copy when it needs to expand.

In my opinion, your out of memory exception can be due to one of the below reasons.

The size of your object plays is over 2GB, and by default the maximum size of a CLR object in .NET is 2GB(even on x64) See here

Now, your object doesn't have to be 2GB. Fragmentation in Large Object Heap (LOH) can cause an object smaller than 2GB to throw an out of memory exception as well. (Any object over 80kb or so will reside in Large object heap)

Another case is when OS can't allocate a contiguous block of virtual memory for your large object, but I don't think this would be the case since you've mentioned you have 32GB RAM.

I wouldn't just go and enable gcAllowVeryLargeObjects unless there aren't any other options. I have seen memory consumption of one of my large data handling Apis go up from 3GB to 8GB after turning on that setting. (Even though most of it were only reserved) I think this is because you are allowing your app to ask the OS for as much memory as it needs to hold a large object. This can be particularly problematic if you are hosting other apps on the same server. It's good to have an upper limit on how much memory a managed object can take.

Another thing to note is that by default GC doesn't compact the LOH. So, this means the size of the working set will remain large unless a full garbage collection takes place. (You can call GC to compact LOH from .NET 4.5.1 onwards) See here

I would strongly recommend using a memory profiler like dotMemory to first understand what's going on under the hood before making any decisions.

If you are targeting x64 and if this is a web application, then make sure IIS is set to use 64-bit version as well. See here for local IIS express and IIS on Server

If I were you, I would try to break this task into smaller batches.

What is the purpose of loading this entire file at one go? Are you trying to do some IO operation with the loaded data or any CPU intensive tasks?

Here is a useful link on GC fundamentals

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM