Json 阵列展平与 Jackson / 解析性能

Question

I have a JSON like this below: My aimed POJO is我有一个 JSON 如下所示：我的目标 POJO 是

[{
        "id": "1",
        "teams": [{
                "name": "barca"
            },
            {
                "name": "real"
            }
        ]
    },
    {
        "id": "2"
    },
    {
        "id": "3",
        "teams": [{
                "name": "atletico"
            },
            {
                "name": "cz"
            }
        ]
    }
]

My aimed POJO is我的目标 POJO 是

class Team
int id;
String name;

Meaning, for each "team" I want to create a new object.意思是，对于每个“团队”，我想创建一个新的 object。 Like;喜欢;

new Team(1,barca)
new Team(1,real)
new Team(2,null)
new Team(3,atletico)
...

Which I believe I did with custom deserializer like below:我相信我使用自定义反序列化器完成了如下操作：

            JsonNode rootArray = jsonParser.readValueAsTree();
            for (JsonNode root : rootArray) {
                String id = root.get("id").toString();
                JsonNode teamsNodeArray = root.get("teams");
                if (teamsNodeArray != null) {
                    for (JsonNode teamNode: teamsNodeArray ) {
                        String nameString = teamNode.get("name").toString();
                        teamList.add(new Team(id, nameString));
                    }
                } else {
                    teamList.add(new Team(id, null));
                }
            }

Condidering I am getting 750k records... having 2 fors is I believe making the code way slower than it should be.考虑到我收到了 750k 条记录……我相信拥有 2 个 fors 会使代码变得比它应该的要慢。 It takes ~7min.大约需要 7 分钟。

My question is, could you please enlighten me if there is any better way to do this?我的问题是，如果有更好的方法可以请您赐教吗？

PS: I have checked many stackoverflow threads for this, could not find anything that fits so far. PS：我已经为此检查了许多stackoverflow线程，到目前为止找不到任何适合的东西。

Thank you in advance.先感谢您。

Answer 1

Do not parse the data yourself, use automatic de/serialization whenever possible.不要自己解析数据，尽可能使用自动反序列化。

Using jackson it could be as simple as:使用 jackson 可以很简单：

MyData myData = new ObjectMapper().readValue(rawData, MyData.class);

For you specific example, we generate a really big instance (10M rows):对于您的具体示例，我们生成一个非常大的实例（10M 行）：

$ head big.json 
[{"id": 1593, "group": "6141", "teams": [{"id": 10502, "name": "10680"}, {"id": 16435, "name": "18351"}]}
,{"id": 28478, "group": "3142", "teams": [{"id": 30951, "name": "3839"}, {"id": 25310, "name": "19839"}]}
,{"id": 29810, "group": "8889", "teams": [{"id": 5586, "name": "8825"}, {"id": 27202, "name": "7335"}]}
...
$ wc -l big.json 
10000000 big.json

Then, define classes matching your data model (eg):然后，定义与您的数据 model 匹配的类（例如）：

public static class Team {
    public int id;
    public String name;
}

public static class Group {
    public int id;
    public String group;
    public List<Team> teams;
}

Now you can read directly the data by simply:现在您可以通过简单的方式直接读取数据：

List<Group> xs = new ObjectMapper()
                   .readValue(
                       new File(".../big.json"),
                       new TypeReference<List<Group>>() {});

A complete code could be:完整的代码可以是：

public static void main(String... args) throws IOException {

    long t0 = System.currentTimeMillis();

    List<Group> xs = new ObjectMapper().readValue(new File("/home/josejuan/tmp/1/big.json"), new TypeReference<List<Group>>() {});

    long t1 = System.currentTimeMillis();

    // test: add all group id
    long groupIds = xs.stream().mapToLong(x -> x.id).sum();

    long t2 = System.currentTimeMillis();

    System.out.printf("Group id sum := %d, Read time := %d mS, Sum time = %d mS%n", groupIds, t1 - t0, t2 - t1);
}

With output:使用 output：

Group id sum := 163827035542, Read time := 10710 mS, Sum time = 74 mS

Only 11 seconds to parse 10M rows .解析10M 行仅需11 秒。

To check data and compare performance, we can read directly from disk:要检查数据并比较性能，我们可以直接从磁盘读取：

$ perl -n -e 'print "$1\n" if /"id": ([0-9]+), "group/' big.json | time awk '{s+=$1}END{print s}'
163827035542
4.96user

Using 5 seconds (the Java code is only half as slow).使用5 秒（ Java代码只有一半慢）。

The non-performance problem of processing the data can be solved in many ways depending on how you want to use the information.处理数据的非性能问题可以通过多种方式解决，具体取决于您希望如何使用信息。 For example, grouping all the teams can be done:例如，可以对所有团队进行分组：

List<Team> teams = xs.stream()
                     .flatMap(x -> x.teams.stream())
                     .collect(toList());

Map<Integer, Team> uniqTeams = xs.stream()
                                 .flatMap(x -> x.teams.stream())
                                 .collect(toMap(
                                      x -> x.id,
                                      x -> x,
                                      (a, b) -> a));

Json 阵列展平与 Jackson / 解析性能

问题描述

1 个解决方案

解决方案1
1 2021-05-24 09:19:53

Json 阵列展平与 Jackson / 解析性能

问题描述

1 个解决方案

解决方案1 1 2021-05-24 09:19:53

解决方案1
1 2021-05-24 09:19:53