[英]Json Array Flattening with Jackson / Parsing Performance
I have a JSON like this below: My aimed POJO is我有一个 JSON 如下所示: 我的目标 POJO 是
[{
"id": "1",
"teams": [{
"name": "barca"
},
{
"name": "real"
}
]
},
{
"id": "2"
},
{
"id": "3",
"teams": [{
"name": "atletico"
},
{
"name": "cz"
}
]
}
]
My aimed POJO is我的目标 POJO 是
class Team
int id;
String name;
Meaning, for each "team" I want to create a new object.意思是,对于每个“团队”,我想创建一个新的 object。 Like;喜欢;
new Team(1,barca)
new Team(1,real)
new Team(2,null)
new Team(3,atletico)
...
Which I believe I did with custom deserializer like below:我相信我使用自定义反序列化器完成了如下操作:
JsonNode rootArray = jsonParser.readValueAsTree();
for (JsonNode root : rootArray) {
String id = root.get("id").toString();
JsonNode teamsNodeArray = root.get("teams");
if (teamsNodeArray != null) {
for (JsonNode teamNode: teamsNodeArray ) {
String nameString = teamNode.get("name").toString();
teamList.add(new Team(id, nameString));
}
} else {
teamList.add(new Team(id, null));
}
}
Condidering I am getting 750k records... having 2 fors is I believe making the code way slower than it should be.考虑到我收到了 750k 条记录……我相信拥有 2 个 fors 会使代码变得比它应该的要慢。 It takes ~7min.大约需要 7 分钟。
My question is, could you please enlighten me if there is any better way to do this?我的问题是,如果有更好的方法可以请您赐教吗?
PS: I have checked many stackoverflow threads for this, could not find anything that fits so far. PS:我已经为此检查了许多stackoverflow线程,到目前为止找不到任何适合的东西。
Thank you in advance.先感谢您。
Do not parse the data yourself, use automatic de/serialization whenever possible.不要自己解析数据,尽可能使用自动反序列化。
Using jackson it could be as simple as:使用 jackson 可以很简单:
MyData myData = new ObjectMapper().readValue(rawData, MyData.class);
For you specific example, we generate a really big instance (10M rows):对于您的具体示例,我们生成一个非常大的实例(10M 行):
$ head big.json
[{"id": 1593, "group": "6141", "teams": [{"id": 10502, "name": "10680"}, {"id": 16435, "name": "18351"}]}
,{"id": 28478, "group": "3142", "teams": [{"id": 30951, "name": "3839"}, {"id": 25310, "name": "19839"}]}
,{"id": 29810, "group": "8889", "teams": [{"id": 5586, "name": "8825"}, {"id": 27202, "name": "7335"}]}
...
$ wc -l big.json
10000000 big.json
Then, define classes matching your data model (eg):然后,定义与您的数据 model 匹配的类(例如):
public static class Team {
public int id;
public String name;
}
public static class Group {
public int id;
public String group;
public List<Team> teams;
}
Now you can read directly the data by simply:现在您可以通过简单的方式直接读取数据:
List<Group> xs = new ObjectMapper()
.readValue(
new File(".../big.json"),
new TypeReference<List<Group>>() {});
A complete code could be:完整的代码可以是:
public static void main(String... args) throws IOException {
long t0 = System.currentTimeMillis();
List<Group> xs = new ObjectMapper().readValue(new File("/home/josejuan/tmp/1/big.json"), new TypeReference<List<Group>>() {});
long t1 = System.currentTimeMillis();
// test: add all group id
long groupIds = xs.stream().mapToLong(x -> x.id).sum();
long t2 = System.currentTimeMillis();
System.out.printf("Group id sum := %d, Read time := %d mS, Sum time = %d mS%n", groupIds, t1 - t0, t2 - t1);
}
With output:使用 output:
Group id sum := 163827035542, Read time := 10710 mS, Sum time = 74 mS
Only 11 seconds to parse 10M rows .解析10M 行仅需11 秒。
To check data and compare performance, we can read directly from disk:要检查数据并比较性能,我们可以直接从磁盘读取:
$ perl -n -e 'print "$1\n" if /"id": ([0-9]+), "group/' big.json | time awk '{s+=$1}END{print s}'
163827035542
4.96user
Using 5 seconds (the Java code is only half as slow).使用5 秒( Java代码只有一半慢)。
The non-performance problem of processing the data can be solved in many ways depending on how you want to use the information.处理数据的非性能问题可以通过多种方式解决,具体取决于您希望如何使用信息。 For example, grouping all the teams can be done:例如,可以对所有团队进行分组:
List<Team> teams = xs.stream()
.flatMap(x -> x.teams.stream())
.collect(toList());
Map<Integer, Team> uniqTeams = xs.stream()
.flatMap(x -> x.teams.stream())
.collect(toMap(
x -> x.id,
x -> x,
(a, b) -> a));
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.