简体   繁体   English

如何读取大型JSON文件?

[英]How to read a large JSON file?

How to read a large JSON file ?

    {   
    "Count": 361888,
    "Items": 
    [
    {   "S3Url": {"S": Grouper/1904/1/private/drafts/D1_2/siepon_D1_2/siepon_C11_D1‌​_2_diff.pdf" },
        "JSONFile": {"S": Grouper/1904/1/private/drafts/D1_2/siepon_D1_2/siepon_C11_D1‌​_2_diff.pdf.json" },
        "ErrTs": {"N": "1488010286704"}
    },
    {   "S3Url": {"S": Mentor/47200043/Public/07/11-07-1984-05-000s-june-2007-mesh-‌​ad-hoc- agenda.ppt.pdf" },
        "JSONFile": {"S": "Mentor/47200043/Public/07/11-07-1984-05-000s-june-2007- mesh-ad-hoc-agenda.ppt.pdf.json"},
        "ErrTs": {"N": "1490497271699"}
    }
    ],
    "ScannedCount": 23
    }

This is the input JSON File format. File is too large so cannot use:
*Jsonparser parser=new Jsonparser();
*Object obj=parser.parse(new FileReader(JSON_FILE_PATH))
Error is :
java.lang.OutOfMemoryError: Java heap space
increase the maximum heap size by using JVM options "-Xmx512M" won't work.
tried the code :
     jsonParser.parse(new FileReader(JSON_FILE_PATH), new ContentHandler() {
        private String key;
        private Object value;

        // A bunch of "default" methods
        @Override public void startJSON() { }
        @Override public void endJSON() { }
        @Override public boolean startObject() { return true; }
        @Override public boolean endObject() { return true; }
        @Override public boolean startArray() { return true; }
        @Override public boolean endArray() { return true; }

        @Override
        public boolean startObjectEntry(final String key) {
            this.key = key;
            return true;
        }

        @Override
        public boolean endObjectEntry() {
            System.out.println(key + " => " + value);
            return true;
        }

        @Override
        public boolean primitive(final Object value) {
            this.value = value;
            return true;
        }
    });
    }

Expected Output: key : S3Url value : Grouper/1904/1/private/drafts/D1_2/siepon_D1_2/siepon_C11_D1‌​_2_diff.pdf in excel 预期输出:键:S3Url值:Excel中的Grouper / 1904/1 / private / drafts / D1_2 / siepon_D1_2 / siepon_C11_D1‌_2_diff.pdf

Actual Output: key : S value : Grouper/1904/1/private/drafts/D1_2/siepon_D1_2/siepon_C11_D1‌​_2_diff.pdf in excel key : S value : Grouper/1904/1/private/drafts/D1_2/siepon_D1_2/siepon_C11_D1‌​_2_diff.pdf in excel 实际输出:键:S值:Grouper / 1904/1 / private / drafts / D1_2 / siepon_D1_2 / siepon_C11_D1‌_2_diff.pdf在excel键中:S值:Grouper / 1904/1 / private / drafts / D1_2 / siepon_D1_2 / siepon_C11_D1‌ excel中的_2_diff.pdf

which is repeating. So please help to read the large Json file in required format.

This error can be caused due to memory leak 此错误可能是由于内存泄漏引起的

How to solve java.lang.OutOfMemoryError: Java heap space 如何解决java.lang.OutOfMemoryError:Java堆空间

1) An easy way to solve OutOfMemoryError in java is to i ncrease the maximum heap size by using JVM options "-Xmx512M", this will immediately solve your OutOfMemoryError. 1)解决Java中OutOfMemoryError的一个简单方法是使用JVM选项“ -Xmx512M”来增加最大堆大小 ,这将立即解决您的OutOfMemoryError。 This is my preferred solution when I get OutOfMemoryError in Eclipse, Maven or ANT while building project because based upon size of project you can easily run out of Memory.here is an example of increasing maximum heap size of JVM, Also its better to keep -Xmx to -Xms ration either 1:1 or 1:1.5 if you are setting heap size in your java application 当我在构建项目时在Eclipse,Maven或ANT中遇到OutOfMemoryError时,这是我的首选解决方案,因为根据项目的大小,您很容易用完内存。以下是增加JVM最大堆大小的示例,而且最好保留-如果要在Java应用程序中设置堆大小,则Xmx与-Xms的比率为1:1或1:1.5

export JVM_ARGS="-Xms1024m -Xmx1024m" export JVM_ARGS =“-Xms1024m -Xmx1024m”

2) The second way to resolve OutOfMemoryError in Java is rather hard and comes when you don't have much memory and even after increase maximum heap size you are still getting java.lang.OutOfMemoryError, in this case, you probably want to profile your application and look for any memory leak . 2)解决Java中OutOfMemoryError的第二种方法相当困难,并且当您没有足够的内存并且即使增加最大堆大小后,您仍然会遇到java.lang.OutOfMemoryError,在这种情况下,您可能想要分析您的应用程序并查找任何内存泄漏 You can use Eclipse Memory Analyzer to examine your heap dump or you can use any profiler like Netbeans or JProbe. 您可以使用Eclipse Memory Analyzer检查您的堆转储,也可以使用任何配置程序,例如Netbeans或JProbe。 This is tough solution and requires some time to analyze and find memory leaks. 这是一个艰难的解决方案,需要一些时间来分析和查找内存泄漏。

Tools to investigate and fix OutOfMemoryError in Java 调查和修复Java中的OutOfMemoryError的工具

1) Visualgc 1)Visualgc

2) Jmap 2)Jmap

3) Jhat 3)哈特

4) Eclipse memory analyzer 4)Eclipse内存分析器

5) Books to learn Profiling 5)学习分析的书籍

Read more: here 阅读更多: 这里

You're getting this error because your JVM cannot allocate memory enough to store the result JSONObject instance that's a subclass of HashMap (and that is clear according to the stacktrace). 之所以收到此错误,是因为JVM无法分配足够的内存来存储作为HashMap子类的结果JSONObject实例(根据stacktrace可以清楚地看到)。 Although you claim to have a 400MB JSON document, it may be smaller comparing to other JSON documents, and increasing the memory size won't help you much. 尽管您声称拥有400MB的JSON文档,但与其他JSON文档相比,它可能更小,并且增加内存大小不会对您有多大帮助。 You can parse the given JSON document with almost zero-cost from the JVM resources perspective using streaming, but you have to write more sophisticated code. 您可以使用流技术从JVM资源的角度分析几乎零成本的给定JSON文档,但是您必须编写更复杂的代码。 com.googlecode.json-simple:json-simple supports streamed reading via using ContentHandler s. com.googlecode.json-simple:json-simple支持通过使用ContentHandler流式读取。

Example: 例:

{
    "foo": 1,
    "bar": 2
}
try ( final Reader reader = getPackageResourceReader(Q43446452.class, "document.json") ) {
    final JSONParser jsonParser = new JSONParser();
    jsonParser.parse(reader, new ContentHandler() {
        private String key;
        private Object value;

        // A bunch of "default" methods
        @Override public void startJSON() { }
        @Override public void endJSON() { }
        @Override public boolean startObject() { return true; }
        @Override public boolean endObject() { return true; }
        @Override public boolean startArray() { return true; }
        @Override public boolean endArray() { return true; }

        @Override
        public boolean startObjectEntry(final String key) {
            this.key = key;
            return true;
        }

        @Override
        public boolean endObjectEntry() {
            System.out.println(key + " => " + value);
            return true;
        }

        @Override
        public boolean primitive(final Object value) {
            this.value = value;
            return true;
        }
    });
}

Sure, it's an extremely primitive example, and there is a cost for you , not for JVM, but you can parse even infinite JSON streams using such an approach. 当然,这是一个非常原始的示例,这对 (而不是对JVM)有一定的成本,但是您可以使用这种方法来解析甚至无限的JSON流。

Output: 输出:

foo => 1 foo => 1
bar => 2 酒吧=> 2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM