简体   繁体   中英

Read a huge json array file of objects

I have a big file that sizes of 40Gb, when i try to convert this json file of array of objects to a list of java objects, it crash, i've used all sizes of maximum heap xmx but without result !

public Set<Interlocutor> readJsonInterlocutorsToPersist() {
    String userHome = System.getProperty(USER_HOME);
    log.debug("Read file interlocutors "+userHome);
    try {
        ObjectMapper mapper = new ObjectMapper();
        // JSON file to Java object
        Set<Interlocutor> interlocutorDeEntities = mapper.readValue(
                new File(userHome + INTERLOCUTORS_TO_PERSIST),
                new TypeReference<Set<Interlocutor>>() {
                });
        return interlocutorDeEntities;
    } catch (Exception e) {
        log.error("Exception while Reading InterlocutorsToPersist file.",
                e.getMessage());
        return null;
    }
} 

is there a way to read this file using BufferedReader and then to push object by object?

Edit:

i found the SOLUTION from @Viacheslav:

public Set<Interlocutor> readJsonInterlocutorsToPersist() throws IOException {
        String userHome = System.getProperty(USER_HOME);
        log.debug("readJsonInterlocutorsToPersist file");
        JsonReader reader = new JsonReader(new InputStreamReader(new FileInputStream(userHome + INTERLOCUTORS_TO_PERSIST), "UTF-8"));
        Set<Interlocutor> interlocutorDeEntities = new HashSet<Interlocutor>();
        reader.beginArray();
        Gson gson =  new GsonBuilder()
        .registerTypeAdapter(Date.class, UnixEpochDateTypeAdapter.getUnixEpochDateTypeAdapter())
        .create();
        int i = 0;
        while (reader.hasNext()) {
            Interlocutor message = gson.fromJson(reader, Interlocutor.class);
            log.debug((++i) +" add new interlocutor");
            interlocutorDeEntities.add(message);
        }
        reader.endArray();
        reader.close();
        return interlocutorDeEntities;
    }

Thanks a lot !

You should definitly have a look at the Jackson Streaming API ( https://www.baeldung.com/jackson-streaming-api ). I used it myself for GB large JSON files. The great thing is you can divide your JSON into several smaller JSON objects and then parse them with mapper.readTree(parser) . That way you can combine the convenience of normal Jackson with the speed and scalability of the Streaming API.

Related to your problem:

I understood that your have a really large array (which is the reason for the file size) and some much more readable objects:

eg:

[ // 40GB
{}, // Only 400 MB
{},
]

What you can do now is to parse the file with Jackson's Streaming API and go through the array. But each individual object can be parsed as "regular" Jackson object and then processed easily.

You may have a look at this Use Jackson To Stream Parse an Array of Json Objects which actually matches your problem pretty well.

is there a way to read this file using BufferedReader and then to push object by object?

Of course, not. Even you can open this file how you can store 40GB as java objects in memory? I think you don't have such amount of memory in you computers (but technically using ObjectMapper you showld have about 2 times more operation memories - 40GB for store json + 40GB for store results as java objects = 80 GB).

I think you, you should use any ways from this questions , but store information in databases or files instead of memory. For example, if you have millions rows in json, you should parse and save every rows to database without keeping it all in memory. And then you can get this data from database step by step (for example, not more then 1GB for every time).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM