简体   繁体   English

在JAVA中读取超大且动态嵌套的JSON文件

[英]Read Very Large and Dynamic Nested JSON file in JAVA

I have a huge json file (++500mb) consists of dynamic structure of nested json file. 我有一个巨大的json文件(++ 500mb),由嵌套json文件的动态结构组成。 This json was extracted to file using 'json.dump' in python. 此json使用python中的json.dump提取到文件中。 My problem is how can i read this huge json file with buffer method? 我的问题是如何使用缓冲方法读取此巨大的json文件?

Since if i read all the strings in the same runtime it throws java heap error. 因为如果我在同一运行时中读取所有字符串,则会引发Java堆错误。 My thought is i want to read the json each record and then parse it, after that continue to next record, parse it, and so on. 我的想法是我想读取每个记录的json,然后解析它,之后继续下一个记录,解析它,依此类推。 But how can i know which one is the end of one json record. 但是我怎么知道哪一个是一个json记录的结尾。 Because i can't find the seperator between each json record. 因为我找不到每个json记录之间的分隔符。

Any suggestion? 有什么建议吗? Please ask if something is not clear. 请询问是否不清楚。 Thanks 谢谢

Assuming that you can't simply increase the heap space size with -Xmx you can switch your JSON reading logic to use a SAX JSON parsers eg RapidJSON or Jackson Streaming API . 假设您不能使用-Xmx来简单地增加堆空间大小,则可以切换JSON读取逻辑以使用SAX JSON解析器,例如RapidJSONJackson Streaming API Instead of storing the entire JSON body in the memory those libraries will emit an event for each encountered JSON construct: 这些库不是将整个JSON主体存储在内存中,而是针对每个遇到的JSON构造发出一个事件:

{
  "hello": "world",
  "t": true
  ...
}

will produce below when using RapidJSON: 使用RapidJSON时将产生以下内容:

StartObject()
Key("hello", 5, true)
String("world", 5, true)
Key("t", 1, true)
Bool(true)
...
EndObject()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM