[英]Arrays and Linked List : will arrays be able to allocate 300MB in memory if 512 MB is free but 300 MB is not contiguous
[英]How to deserialise big JSON file (~300Mb)
我想解析一個JSON
文件(大小約為 300Mb)。 我使用Jackson
庫和ObjectMapper
。 如果我遇到 memory 問題是否正常?
第一次,我使用BufferedReader
,它使應用程序崩潰。 接下來,我使用這個庫。 解析保存到SQLite數據庫需要多少時間,很長?
您可以將Streaming API
與常規ObjectMapper
混合使用。 使用這些我們可以實現很好的迭代器class。 使用URL
我們可以構建 stream 並傳遞給我們的實現。 示例代碼如下所示:
import com.fasterxml.jackson.annotation.JsonProperty;
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.core.JsonToken;
import com.fasterxml.jackson.databind.DeserializationFeature;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.Reader;
import java.math.BigDecimal;
import java.net.URL;
import java.util.Iterator;
public class JsonPathApp {
public static void main(String[] args) throws Exception {
//Just to make it work. Probably you should not do that!
SSLUtilities.trustAllHostnames();
SSLUtilities.trustAllHttpsCertificates();
URL url = new URL("https://data.opendatasoft.com/explore/dataset/vehicules-commercialises@public/download/?format=json&timezone=Europe/Berlin");
try (BufferedReader reader = new BufferedReader(new InputStreamReader(url.openConnection().getInputStream()))) {
FieldsJsonIterator fieldsJsonIterator = new FieldsJsonIterator(reader);
while (fieldsJsonIterator.hasNext()) {
Fields fields = fieldsJsonIterator.next();
System.out.println(fields);
// Save object to DB
}
}
}
}
class FieldsJsonIterator implements Iterator<Fields> {
private final ObjectMapper mapper;
private final JsonParser parser;
public FieldsJsonIterator(Reader reader) throws IOException {
mapper = new ObjectMapper();
mapper.disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES);
parser = mapper.getFactory().createParser(reader);
skipStart();
}
private void skipStart() throws IOException {
while (parser.currentToken() != JsonToken.START_OBJECT) {
parser.nextToken();
}
}
@Override
public boolean hasNext() {
try {
while (parser.currentToken() == null) {
parser.nextToken();
}
} catch (IOException e) {
throw new IllegalStateException(e);
}
return parser.currentToken() == JsonToken.START_OBJECT;
}
@Override
public Fields next() {
try {
return mapper.readValue(parser, FieldsWrapper.class).fields;
} catch (IOException e) {
throw new IllegalStateException(e);
}
}
private static final class FieldsWrapper {
public Fields fields;
}
}
class Fields {
private String cnit;
@JsonProperty("puissance_maximale")
private BigDecimal maximumPower;
@JsonProperty("champ_v9")
private String fieldV9;
@JsonProperty("boite_de_vitesse")
private String gearbox;
// add other required properties
// getters, setters, toString
}
上面的代碼打印:
Fields{cnit='MMB76K3BQJ41', maximumPower=110.0, fieldV9='70/220*2006/96EURO4', gearbox='A 5'}
Fields{cnit='M10MCDVPF15Z219', maximumPower=95.0, fieldV9='"715/2007*566/2011EURO5', gearbox='A 7'}
Fields{cnit='M10MCDVP027V654', maximumPower=150.0, fieldV9='715/2007*692/2008EURO5', gearbox='A 7'}
Fields{cnit='M10MCDVPG137264', maximumPower=120.0, fieldV9='715/2007*692/2008EURO5', gearbox='M 6'}
Fields{cnit='MVV4912QN718', maximumPower=210.0, fieldV9='null', gearbox='A 6'}
Fields{cnit='MMB76K3B2K88', maximumPower=110.0, fieldV9='null', gearbox='A 5'}
Fields{cnit='M10MCDVP012N140', maximumPower=80.0, fieldV9='70/220*2006/96EURO4', gearbox='M 6'}
Fields{cnit='MJN5423PU123', maximumPower=88.0, fieldV9='null', gearbox='M 6'}
Fields{cnit='M10MCDVP376T303', maximumPower=120.0, fieldV9='"715/2007*692/2008EURO5', gearbox='M 6'}
Fields{cnit='MMB53H3B5Z93', maximumPower=80.0, fieldV9='70/220*2006/96EURO4', gearbox='M 6'}
Fields{cnit='MPE1403E4834', maximumPower=81.0, fieldV9='null', gearbox='M 5'}
Fields{cnit='M10MCDVP018J905', maximumPower=110.0, fieldV9='70/220*2006/96EURO4', gearbox='M 6'}
Fields{cnit='M10MCDVPG112904', maximumPower=100.0, fieldV9='"715/2007*692/2008EURO5', gearbox='M 6'}
Fields{cnit='M10MCDVP015R723', maximumPower=110.0, fieldV9='70/220*2006/96EURO4', gearbox='A 5'}
...
我們可以使用Gson
來做同樣的事情。 示例實現可能如下所示:
class FieldsJsonIterator implements Iterator<Fields> {
private final Gson mapper;
private final JsonReader parser;
public FieldsJsonIterator(Reader reader) throws IOException {
mapper = new GsonBuilder().create();
parser = mapper.newJsonReader(reader);
skipStart();
}
private void skipStart() throws IOException {
parser.beginArray();
}
@Override
public boolean hasNext() {
try {
return parser.hasNext();
} catch (IOException e) {
throw new IllegalStateException(e);
}
}
@Override
public Fields next() {
return ((FieldsWrapper) mapper.fromJson(parser, FieldsWrapper.class)).fields;
}
private static final class FieldsWrapper {
public Fields fields;
}
}
class Fields {
private String cnit;
@SerializedName("puissance_maximale")
private BigDecimal maximumPower;
@SerializedName("champ_v9")
private String fieldV9;
@SerializedName("boite_de_vitesse")
private String gearbox;
// getters, setters, toString
}
用法和 output 應該與Jackson
相同。
也可以看看:
感謝您的代碼,它運行良好且快速,我使用 Jackson 庫。
我在你的代碼中看到了這個 class,我很感興趣,你在哪里找到這個 Android 代碼庫:
//Just to make it work. Probably you should not do that!
SSLUtilities.trustAllHostnames();
SSLUtilities.trustAllHttpsCertificates();
此外,我想知道解析器 JSON 不按 json 對象文件的順序解析是否正常(這里是“designation_commerciale”="LaFerrari" year="2014" 它是第一個元素)?
謝謝您的幫助。
您可以閱讀每個令牌的 JSON 文件。 當然,您需要了解 JSON 結構並定義 Object。 您需要在while中存儲數據以避免超出memory。 我希望它對你有幫助。 我嘗試使用大約 170MB 的文件。 你能給我你的 JSON 結構嗎?
JSON結構
[{
"id":1,
"content":"Jan"
},
{
"id":2,
"content":"Feb"
}]
Object 基於 JSON
@NoArgsConstructor @AllArgsConstructor @FieldNameConstants @Setter
public class MyObject {
private int id;
private String content;
}
處理文件
public List<MyObject> process(String configFile) throws IOException {
try {
File jsonFile = new File(getClass().getClassLoader().getResource(configFile).getFile());
var jsonfactory = new JsonFactory();
JsonParser jsonParser = jsonfactory.createParser(jsonFile);
JsonToken jsonToken = jsonParser.nextToken();
List<MyObject> data = new ArrayList<>();
MyObject object = new MyObject();
while (jsonToken != JsonToken.END_ARRAY) {
String fieldName = jsonParser.getCurrentName();
if (MyObject.Fields.id.equals(fieldName)) {
jsonToken = jsonParser.nextToken();
object.setId(jsonParser.getIntValue());
}
if (MyObject.Fields.content.equals(fieldName)) {
jsonToken = jsonParser.nextToken();
object.setContent(jsonParser.getText());
}
if (jsonToken == JsonToken.END_OBJECT) {
data.add(object);
object = new MyObject();
//TODO: Should store and clear data after have big list to avoid out of memory
}
jsonToken = jsonParser.nextToken();
}
return data;
} catch (IOException e) {
e.getMessage();
}
return null;
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.