繁体   English   中英

使用Jackson将Apache Any23生成的json-ld解析为Java Pojo

[英]Parse json-ld generated by Apache Any23 into Java Pojo using Jackson

我想把html文本中的map结构化数据(microdata,jsonld)提取到一个Java的POJO中。 对于提取,我使用库 Apache Any23 并配置了一个JSONLDWriter以将 html 文档中找到的结构化数据转换为json-ld格式。

这按预期工作并给了我以下 output:

[ {
  "@graph" : [ {
    "@id" : "_:node1gn1v4pudx1",
    "@type" : [ "http://schema.org/JobPosting" ],
    "http://schema.org/datePosted" : [ {
      "@language" : "en-us",
      "@value" : "Wed Jan 11 02:00:00 UTC 2023"
    } ],
    "http://schema.org/description" : [ {
      "@language" : "en-us",
      "@value" : "Comprehensive Job Description"
    } ],
    "http://schema.org/hiringOrganization" : [ {
      "@language" : "en-us",
      "@value" : "Org AG"
    } ],
    "http://schema.org/jobLocation" : [ {
      "@id" : "_:node1gn1v4pudx2"
    } ],
    "http://schema.org/title" : [ {
      "@language" : "en-us",
      "@value" : "Recruiter (m/f/d)\n    "
    } ]
  }, {
    "@id" : "_:node1gn1v4pudx2",
    "@type" : [ "http://schema.org/Place" ],
    "http://schema.org/address" : [ {
      "@id" : "_:node1gn1v4pudx3"
    } ]
  }, {
    "@id" : "_:node1gn1v4pudx3",
    "@type" : [ "http://schema.org/PostalAddress" ],
    "http://schema.org/addressCountry" : [ {
      "@language" : "en-us",
      "@value" : "Company Country"
    } ],
    "http://schema.org/addressLocality" : [ {
      "@language" : "en-us",
      "@value" : "Company City"
    } ],
    "http://schema.org/addressRegion" : [ {
      "@language" : "en-us",
      "@value" : "Company Region"
    } ]
  }, {
    "@id" : "https://career.company.com/job/Recruiter/",
    "http://www.w3.org/1999/xhtml/microdata#item" : [ {
      "@id" : "_:node1gn1v4pudx1"
    } ]
  } ],
  "@id" : "https://career.company.com/job/Recruiter/"
} ]

接下来,我想使用 jackson 将 json-ld object 反序列化为 Java bean。POJO class 应该如下所示:

public class JobPosting {
    private String datePosting;
    private String hiringOrganization;
    private String title;
    private String description;

    // Following members could be enclosed in a class too if easier
    // Like class Place{private PostalAddress postalAddress;}
    // private Place place;
    private String addressCountry;
    private String addressLocality;
    private String addressRegion;
}

我想用 Jackson lib 提供的注释来做,但我遇到了一些问题:

  • 用数组节点包裹的@type
  • 实际数据有一个额外的@value
  • 有些对象仅通过@id字段持有对图中其他对象的引用

我如何将 map 这些字段正确地添加到我的 Java Pojo 中?

诀窍是使用 json-ld 处理器处理 json-ld 以获得对开发人员更友好的 json。titanium -json-ld库提供了此类处理器。

JsonDocument input = JsonDocument.of(jsonLdAsInputStream);
JsonObject frame = JsonLd.frame(input, URI.create("http://schema.org")).get();

上面的代码片段通过@id 解析引用,并用给定的 IRI 解析 json 键。
这导致以下 output 很容易通过 Jackson 库解析:

[{
  "id": "_:b0",
  "type": "JobPosting",
  "datePosted": {
    "@language": "en-us",
    "@value": "Wed Jan 11 02:00:00 UTC 2023"
  },
  "description": {
    "@language": "en-us",
    "@value": "Comprehensive Job Description"
  },
  "hiringOrganization": {
    "@language": "en-us",
    "@value": "Org AG"
  },
  "jobLocation": {
    "id": "_:b1",
    "type": "Place",
    "address": {
      "id": "_:b2",
      "type": "PostalAddress",
      "addressCountry": {
        "@language": "en-us",
        "@value": "Company Country"
      },
      "addressLocality": {
        "@language": "en-us",
        "@value": "Company City"
      },
      "addressRegion": {
        "@language": "en-us",
        "@value": "Company Region"
      }
    }
  },
  "title": {
    "@language": "en-us",
    "@value": "Recruiter (m/f/d)\n    "
  }
}]

我正在使用谷歌 GSON 库反序列化 json-ld object。示例代码如下:

String jsonld = any23.extract(htmlDocument);
Gson gson = new Gson();
MyPojo pojo = gson.fromJson(jsonld, MyPojo.class);

那是你的选择吗?

否则,您将需要创建一个与 JSON-LD 文档结构相匹配的 Java object。 然后可以使用Jackson提供的ObjectMapper将JSON-LD解析成合适的Java object。

ObjectMapper mapper = new ObjectMapper(); 

MyObject myObject = mapper.readValue(json-ld, MyObject.class);

MyObject class 应具有与 JSON-LD 文档结构相匹配的字段。 例如,如果 JSON-LD 文档有一个名为“name”的字段,那么 MyObject class 应该有一个名为“name”的 String 类型的字段。

创建相应的 Java object 后,您可以使用 ObjectMapper 将 JSON-LD 解析为 object。然后您可以访问 object 的字段以获取从 JSON-LD 文档中提取的数据。

查看您对 json 感兴趣的元素(例如“datePosted”“hiringOrganization”值),它们总是由 @value”标记并包含在与其名称对应的数组中(在本例中为“http://schema” .org/datePosted”“http://schema.org/hiringOrganization” 。这些都包含在您的 json 文件的一部分中,可以转换为JsonNode节点,可以通过以下方式获取该节点:

JsonNode root = mapper.readTree(json)
                      .get(0)
                      .get("@graph")
                      .get(0);

因此,如果您有如下所示的 pojo:

@Data
public class JobPosting {

    private String datePosted;
    private String hiringOrganization;
}

并且您想检索 datePosted 和 hiringOrganization 值,您可以检查相对 position 在 json 文件中是否仍然相同,并且可以在 for 循环中计算:

JsonNode root = mapper.readTree(json)
                               .get(0)
                               .get("@graph")
                               .get(0);

String strSchema = "http://schema.org/";
String[] fieldNames = {"datePosted", "hiringOrganization"};
//creating a Map<String, String> that will be converted to the JobPosting obj
Map<String, String> map = new HashMap<>();
        for (String fieldName: fieldNames) {
            map.put(fieldName, 
                    root.get(strSchema + fieldName)
                        .get(0)
                        .get("@value")
                        .asText()
            );
        }
  
JobPosting jobPosting = mapper.convertValue(map, JobPosting.class);
//it prints JobPosting(datePosted=Wed Jan 11 02:00:00 UTC 2023, hiringOrganization=Org AG)
System.out.println(jobPosting);

这将需要先进行一些预处理,以便在使用 Jackson 映射之前将带有 id 指针的图形转换为简化的树:

  1. 通过将@id引用替换为实际对象本身,将其变成一棵树。
  2. @value周围那些麻烦的对象/数组包装器扁平化。

下面的完整代码,使用 Java 17 和一些递归:

package org.example;

import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
import com.fasterxml.jackson.annotation.JsonProperty;
import com.fasterxml.jackson.annotation.JsonSubTypes;
import com.fasterxml.jackson.annotation.JsonTypeInfo;
import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.databind.ObjectMapper;

import java.io.File;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;

import static java.util.stream.Collectors.toMap;

class Main {

  public static void main(String[] args) throws Exception {
    var mapper = new ObjectMapper();
    var node = mapper.readValue(new File("test.json"), Object.class);

    // Build a lookup map of "@id" to the actual object.
    var lookup = buildLookup(node, new HashMap<>());

    // Replace "@id" references with the actual objects themselves instead
    var referenced = lookupReferences(node, lookup);

    // Flattens single object array containing "@value" to be just the "@value" themselves
    var flattened = flatten(referenced);

    // Jackson should be able to under our objects at this point, so convert it
    var jobPostings =
        mapper.convertValue(flattened, new TypeReference<List<RootObject>>() {}).stream()
            .flatMap(it -> it.graph().stream())
            .filter(it -> it instanceof JobPosting)
            .map(it -> (JobPosting) it)
            .toList();

    System.out.println(jobPostings);
  }

  private static Map<String, Object> buildLookup(Object node, Map<String, Object> lookup) {
    if (node instanceof List<?> list) {
      for (var value : list) {
        buildLookup(value, lookup);
      }
    } else if (node instanceof Map<?, ?> map) {
      for (var value : map.values()) {
        buildLookup(value, lookup);
      }
      if (map.size() > 1 && map.get("@id") instanceof String id) {
        lookup.put(id, node);
      }
    }
    return lookup;
  }

  private static Object lookupReferences(Object node, Map<String, Object> lookup) {
    if (node instanceof List<?> list
        && list.size() == 1
        && list.get(0) instanceof Map<?, ?> map
        && map.size() == 1
        && map.get("@id") instanceof String id) {
      return lookupReferences(lookup.get(id), lookup);
    }

    if (node instanceof List<?> list) {
      return list.stream().map(value -> lookupReferences(value, lookup)).toList();
    }

    if (node instanceof Map<?, ?> map) {
      return map.entrySet().stream()
          .map(entry -> Map.entry(entry.getKey(), lookupReferences(entry.getValue(), lookup)))
          .collect(toMap(Entry::getKey, Entry::getValue));
    }

    return node;
  }

  private static Object flatten(Object node) {
    if (node instanceof List<?> list && list.size() == 1) {
      if (list.get(0) instanceof String s) {
        return s;
      }
      if (list.get(0) instanceof Map<?, ?> map) {
        var value = map.get("@value");
        if (value != null) {
          return value;
        }
      }
    }

    if (node instanceof List<?> list) {
      return list.stream().map(Main::flatten).toList();
    }

    if (node instanceof Map<?, ?> map) {
      return map.entrySet().stream()
          .map(entry -> Map.entry(entry.getKey(), flatten(entry.getValue())))
          .collect(toMap(Entry::getKey, Entry::getValue));
    }

    return node;
  }
}

@JsonIgnoreProperties(ignoreUnknown = true)
record RootObject(@JsonProperty("@graph") List<GraphObject> graph) {}

@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "@type", defaultImpl = Ignored.class)
@JsonSubTypes({
  @JsonSubTypes.Type(value = JobPosting.class, name = "http://schema.org/JobPosting"),
  @JsonSubTypes.Type(value = Place.class, name = "http://schema.org/Place"),
  @JsonSubTypes.Type(value = PostalAddress.class, name = "http://schema.org/PostalAddress"),
})
interface GraphObject {}

@JsonIgnoreProperties(ignoreUnknown = true)
record Ignored() implements GraphObject {}

@JsonIgnoreProperties(ignoreUnknown = true)
record JobPosting(
    @JsonProperty("http://schema.org/title") String title,
    @JsonProperty("http://schema.org/description") String description,
    @JsonProperty("http://schema.org/hiringOrganization") String hiringOrganization,
    @JsonProperty("http://schema.org/datePosted") String datePosted,
    @JsonProperty("http://schema.org/jobLocation") Place jobLocation)
    implements GraphObject {}

@JsonIgnoreProperties(ignoreUnknown = true)
record Place(@JsonProperty("http://schema.org/address") PostalAddress address)
    implements GraphObject {}

@JsonIgnoreProperties(ignoreUnknown = true)
record PostalAddress(
    @JsonProperty("http://schema.org/addressLocality") String locality,
    @JsonProperty("http://schema.org/addressRegion") String region,
    @JsonProperty("http://schema.org/addressCountry") String country)
    implements GraphObject {}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM