簡體   English   中英

如何反序列化大 JSON 文件(~300Mb)

[英]How to deserialise big JSON file (~300Mb)

我想解析一個JSON文件(大小約為 300Mb)。 我使用Jackson庫和ObjectMapper 如果我遇到 memory 問題是否正常?

第一次,我使用BufferedReader ,它使應用程序崩潰。 接下來,我使用這個庫。 解析保存到SQLite數據庫需要多少時間,很長?

Jackson

您可以將Streaming API與常規ObjectMapper混合使用。 使用這些我們可以實現很好的迭代器class。 使用URL我們可以構建 stream 並傳遞給我們的實現。 示例代碼如下所示:

import com.fasterxml.jackson.annotation.JsonProperty;
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.core.JsonToken;
import com.fasterxml.jackson.databind.DeserializationFeature;
import com.fasterxml.jackson.databind.ObjectMapper;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.Reader;
import java.math.BigDecimal;
import java.net.URL;
import java.util.Iterator;

public class JsonPathApp {

    public static void main(String[] args) throws Exception {
        //Just to make it work. Probably you should not do that!
        SSLUtilities.trustAllHostnames();
        SSLUtilities.trustAllHttpsCertificates();

        URL url = new URL("https://data.opendatasoft.com/explore/dataset/vehicules-commercialises@public/download/?format=json&timezone=Europe/Berlin");
        try (BufferedReader reader = new BufferedReader(new InputStreamReader(url.openConnection().getInputStream()))) {
            FieldsJsonIterator fieldsJsonIterator = new FieldsJsonIterator(reader);
            while (fieldsJsonIterator.hasNext()) {
                Fields fields = fieldsJsonIterator.next();
                System.out.println(fields);
                // Save object to DB
            }
        }
    }
}

class FieldsJsonIterator implements Iterator<Fields> {

    private final ObjectMapper mapper;
    private final JsonParser parser;

    public FieldsJsonIterator(Reader reader) throws IOException {
        mapper = new ObjectMapper();
        mapper.disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES);

        parser = mapper.getFactory().createParser(reader);
        skipStart();
    }

    private void skipStart() throws IOException {
        while (parser.currentToken() != JsonToken.START_OBJECT) {
            parser.nextToken();
        }
    }

    @Override
    public boolean hasNext() {
        try {
            while (parser.currentToken() == null) {
                parser.nextToken();
            }
        } catch (IOException e) {
            throw new IllegalStateException(e);
        }

        return parser.currentToken() == JsonToken.START_OBJECT;
    }

    @Override
    public Fields next() {
        try {
            return mapper.readValue(parser, FieldsWrapper.class).fields;
        } catch (IOException e) {
            throw new IllegalStateException(e);
        }
    }

    private static final class FieldsWrapper {
        public Fields fields;
    }
}

class Fields {

    private String cnit;

    @JsonProperty("puissance_maximale")
    private BigDecimal maximumPower;

    @JsonProperty("champ_v9")
    private String fieldV9;

    @JsonProperty("boite_de_vitesse")
    private String gearbox;

    // add other required properties

    // getters, setters, toString
}

上面的代碼打印:

Fields{cnit='MMB76K3BQJ41', maximumPower=110.0, fieldV9='70/220*2006/96EURO4', gearbox='A 5'}
Fields{cnit='M10MCDVPF15Z219', maximumPower=95.0, fieldV9='"715/2007*566/2011EURO5', gearbox='A 7'}
Fields{cnit='M10MCDVP027V654', maximumPower=150.0, fieldV9='715/2007*692/2008EURO5', gearbox='A 7'}
Fields{cnit='M10MCDVPG137264', maximumPower=120.0, fieldV9='715/2007*692/2008EURO5', gearbox='M 6'}
Fields{cnit='MVV4912QN718', maximumPower=210.0, fieldV9='null', gearbox='A 6'}
Fields{cnit='MMB76K3B2K88', maximumPower=110.0, fieldV9='null', gearbox='A 5'}
Fields{cnit='M10MCDVP012N140', maximumPower=80.0, fieldV9='70/220*2006/96EURO4', gearbox='M 6'}
Fields{cnit='MJN5423PU123', maximumPower=88.0, fieldV9='null', gearbox='M 6'}
Fields{cnit='M10MCDVP376T303', maximumPower=120.0, fieldV9='"715/2007*692/2008EURO5', gearbox='M 6'}
Fields{cnit='MMB53H3B5Z93', maximumPower=80.0, fieldV9='70/220*2006/96EURO4', gearbox='M 6'}
Fields{cnit='MPE1403E4834', maximumPower=81.0, fieldV9='null', gearbox='M 5'}
Fields{cnit='M10MCDVP018J905', maximumPower=110.0, fieldV9='70/220*2006/96EURO4', gearbox='M 6'}
Fields{cnit='M10MCDVPG112904', maximumPower=100.0, fieldV9='"715/2007*692/2008EURO5', gearbox='M 6'}
Fields{cnit='M10MCDVP015R723', maximumPower=110.0, fieldV9='70/220*2006/96EURO4', gearbox='A 5'}
...

Gson

我們可以使用Gson來做同樣的事情。 示例實現可能如下所示:

class FieldsJsonIterator implements Iterator<Fields> {

    private final Gson mapper;
    private final JsonReader parser;

    public FieldsJsonIterator(Reader reader) throws IOException {
        mapper = new GsonBuilder().create();

        parser = mapper.newJsonReader(reader);
        skipStart();
    }

    private void skipStart() throws IOException {
        parser.beginArray();
    }

    @Override
    public boolean hasNext() {
        try {
            return parser.hasNext();
        } catch (IOException e) {
            throw new IllegalStateException(e);
        }
    }

    @Override
    public Fields next() {
        return ((FieldsWrapper) mapper.fromJson(parser, FieldsWrapper.class)).fields;
    }

    private static final class FieldsWrapper {
        public Fields fields;
    }
}

class Fields {

    private String cnit;

    @SerializedName("puissance_maximale")
    private BigDecimal maximumPower;

    @SerializedName("champ_v9")
    private String fieldV9;

    @SerializedName("boite_de_vitesse")
    private String gearbox;

    // getters, setters, toString
}

用法和 output 應該與Jackson相同。

也可以看看:

感謝您的代碼,它運行良好且快速,我使用 Jackson 庫。

我在你的代碼中看到了這個 class,我很感興趣,你在哪里找到這個 Android 代碼庫:

   //Just to make it work. Probably you should not do that!
    SSLUtilities.trustAllHostnames();
    SSLUtilities.trustAllHttpsCertificates();

此外,我想知道解析器 JSON 不按 json 對象文件的順序解析是否正常(這里是“designation_commerciale”="LaFerrari" year="2014" 它是第一個元素)?

謝謝您的幫助。

您可以閱讀每個令牌的 JSON 文件。 當然,您需要了解 JSON 結構並定義 Object。 您需要在while中存儲數據以避免超出memory。 我希望它對你有幫助。 我嘗試使用大約 170MB 的文件。 你能給我你的 JSON 結構嗎?

JSON結構

[{
  "id":1,
  "content":"Jan"
},
{
  "id":2,
  "content":"Feb"
}]

Object 基於 JSON

@NoArgsConstructor @AllArgsConstructor @FieldNameConstants @Setter
public class MyObject {
private int id;
private String content;
}    

處理文件

public List<MyObject> process(String configFile) throws IOException {
    try {
        File jsonFile = new File(getClass().getClassLoader().getResource(configFile).getFile());
        var jsonfactory = new JsonFactory();
        JsonParser jsonParser = jsonfactory.createParser(jsonFile);
        JsonToken jsonToken = jsonParser.nextToken();
        List<MyObject> data = new ArrayList<>();
        MyObject object = new MyObject();

        while (jsonToken != JsonToken.END_ARRAY) {
            String fieldName = jsonParser.getCurrentName();
            if (MyObject.Fields.id.equals(fieldName)) {
                jsonToken = jsonParser.nextToken();
                object.setId(jsonParser.getIntValue());
            }

            if (MyObject.Fields.content.equals(fieldName)) {
                jsonToken = jsonParser.nextToken();
                object.setContent(jsonParser.getText());
            }

            if (jsonToken == JsonToken.END_OBJECT) {
                data.add(object);
                object = new MyObject();
                //TODO: Should store and clear data after have big list to avoid out of memory
            }
            jsonToken = jsonParser.nextToken();
        }
        return data;
    } catch (IOException e) {
        e.getMessage();
    }
    return null;
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM