簡體   English   中英

使用Jackson將Apache Any23生成的json-ld解析為Java Pojo

[英]Parse json-ld generated by Apache Any23 into Java Pojo using Jackson

我想把html文本中的map結構化數據(microdata,jsonld)提取到一個Java的POJO中。 對於提取,我使用庫 Apache Any23 並配置了一個JSONLDWriter以將 html 文檔中找到的結構化數據轉換為json-ld格式。

這按預期工作並給了我以下 output:

[ {
  "@graph" : [ {
    "@id" : "_:node1gn1v4pudx1",
    "@type" : [ "http://schema.org/JobPosting" ],
    "http://schema.org/datePosted" : [ {
      "@language" : "en-us",
      "@value" : "Wed Jan 11 02:00:00 UTC 2023"
    } ],
    "http://schema.org/description" : [ {
      "@language" : "en-us",
      "@value" : "Comprehensive Job Description"
    } ],
    "http://schema.org/hiringOrganization" : [ {
      "@language" : "en-us",
      "@value" : "Org AG"
    } ],
    "http://schema.org/jobLocation" : [ {
      "@id" : "_:node1gn1v4pudx2"
    } ],
    "http://schema.org/title" : [ {
      "@language" : "en-us",
      "@value" : "Recruiter (m/f/d)\n    "
    } ]
  }, {
    "@id" : "_:node1gn1v4pudx2",
    "@type" : [ "http://schema.org/Place" ],
    "http://schema.org/address" : [ {
      "@id" : "_:node1gn1v4pudx3"
    } ]
  }, {
    "@id" : "_:node1gn1v4pudx3",
    "@type" : [ "http://schema.org/PostalAddress" ],
    "http://schema.org/addressCountry" : [ {
      "@language" : "en-us",
      "@value" : "Company Country"
    } ],
    "http://schema.org/addressLocality" : [ {
      "@language" : "en-us",
      "@value" : "Company City"
    } ],
    "http://schema.org/addressRegion" : [ {
      "@language" : "en-us",
      "@value" : "Company Region"
    } ]
  }, {
    "@id" : "https://career.company.com/job/Recruiter/",
    "http://www.w3.org/1999/xhtml/microdata#item" : [ {
      "@id" : "_:node1gn1v4pudx1"
    } ]
  } ],
  "@id" : "https://career.company.com/job/Recruiter/"
} ]

接下來,我想使用 jackson 將 json-ld object 反序列化為 Java bean。POJO class 應該如下所示:

public class JobPosting {
    private String datePosting;
    private String hiringOrganization;
    private String title;
    private String description;

    // Following members could be enclosed in a class too if easier
    // Like class Place{private PostalAddress postalAddress;}
    // private Place place;
    private String addressCountry;
    private String addressLocality;
    private String addressRegion;
}

我想用 Jackson lib 提供的注釋來做,但我遇到了一些問題:

  • 用數組節點包裹的@type
  • 實際數據有一個額外的@value
  • 有些對象僅通過@id字段持有對圖中其他對象的引用

我如何將 map 這些字段正確地添加到我的 Java Pojo 中?

訣竅是使用 json-ld 處理器處理 json-ld 以獲得對開發人員更友好的 json。titanium -json-ld庫提供了此類處理器。

JsonDocument input = JsonDocument.of(jsonLdAsInputStream);
JsonObject frame = JsonLd.frame(input, URI.create("http://schema.org")).get();

上面的代碼片段通過@id 解析引用,並用給定的 IRI 解析 json 鍵。
這導致以下 output 很容易通過 Jackson 庫解析:

[{
  "id": "_:b0",
  "type": "JobPosting",
  "datePosted": {
    "@language": "en-us",
    "@value": "Wed Jan 11 02:00:00 UTC 2023"
  },
  "description": {
    "@language": "en-us",
    "@value": "Comprehensive Job Description"
  },
  "hiringOrganization": {
    "@language": "en-us",
    "@value": "Org AG"
  },
  "jobLocation": {
    "id": "_:b1",
    "type": "Place",
    "address": {
      "id": "_:b2",
      "type": "PostalAddress",
      "addressCountry": {
        "@language": "en-us",
        "@value": "Company Country"
      },
      "addressLocality": {
        "@language": "en-us",
        "@value": "Company City"
      },
      "addressRegion": {
        "@language": "en-us",
        "@value": "Company Region"
      }
    }
  },
  "title": {
    "@language": "en-us",
    "@value": "Recruiter (m/f/d)\n    "
  }
}]

我正在使用谷歌 GSON 庫反序列化 json-ld object。示例代碼如下:

String jsonld = any23.extract(htmlDocument);
Gson gson = new Gson();
MyPojo pojo = gson.fromJson(jsonld, MyPojo.class);

那是你的選擇嗎?

否則,您將需要創建一個與 JSON-LD 文檔結構相匹配的 Java object。 然后可以使用Jackson提供的ObjectMapper將JSON-LD解析成合適的Java object。

ObjectMapper mapper = new ObjectMapper(); 

MyObject myObject = mapper.readValue(json-ld, MyObject.class);

MyObject class 應具有與 JSON-LD 文檔結構相匹配的字段。 例如,如果 JSON-LD 文檔有一個名為“name”的字段,那么 MyObject class 應該有一個名為“name”的 String 類型的字段。

創建相應的 Java object 后,您可以使用 ObjectMapper 將 JSON-LD 解析為 object。然后您可以訪問 object 的字段以獲取從 JSON-LD 文檔中提取的數據。

查看您對 json 感興趣的元素(例如“datePosted”“hiringOrganization”值),它們總是由 @value”標記並包含在與其名稱對應的數組中(在本例中為“http://schema” .org/datePosted”“http://schema.org/hiringOrganization” 。這些都包含在您的 json 文件的一部分中,可以轉換為JsonNode節點,可以通過以下方式獲取該節點:

JsonNode root = mapper.readTree(json)
                      .get(0)
                      .get("@graph")
                      .get(0);

因此,如果您有如下所示的 pojo:

@Data
public class JobPosting {

    private String datePosted;
    private String hiringOrganization;
}

並且您想檢索 datePosted 和 hiringOrganization 值,您可以檢查相對 position 在 json 文件中是否仍然相同,並且可以在 for 循環中計算:

JsonNode root = mapper.readTree(json)
                               .get(0)
                               .get("@graph")
                               .get(0);

String strSchema = "http://schema.org/";
String[] fieldNames = {"datePosted", "hiringOrganization"};
//creating a Map<String, String> that will be converted to the JobPosting obj
Map<String, String> map = new HashMap<>();
        for (String fieldName: fieldNames) {
            map.put(fieldName, 
                    root.get(strSchema + fieldName)
                        .get(0)
                        .get("@value")
                        .asText()
            );
        }
  
JobPosting jobPosting = mapper.convertValue(map, JobPosting.class);
//it prints JobPosting(datePosted=Wed Jan 11 02:00:00 UTC 2023, hiringOrganization=Org AG)
System.out.println(jobPosting);

這將需要先進行一些預處理,以便在使用 Jackson 映射之前將帶有 id 指針的圖形轉換為簡化的樹:

  1. 通過將@id引用替換為實際對象本身,將其變成一棵樹。
  2. @value周圍那些麻煩的對象/數組包裝器扁平化。

下面的完整代碼,使用 Java 17 和一些遞歸:

package org.example;

import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
import com.fasterxml.jackson.annotation.JsonProperty;
import com.fasterxml.jackson.annotation.JsonSubTypes;
import com.fasterxml.jackson.annotation.JsonTypeInfo;
import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.databind.ObjectMapper;

import java.io.File;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;

import static java.util.stream.Collectors.toMap;

class Main {

  public static void main(String[] args) throws Exception {
    var mapper = new ObjectMapper();
    var node = mapper.readValue(new File("test.json"), Object.class);

    // Build a lookup map of "@id" to the actual object.
    var lookup = buildLookup(node, new HashMap<>());

    // Replace "@id" references with the actual objects themselves instead
    var referenced = lookupReferences(node, lookup);

    // Flattens single object array containing "@value" to be just the "@value" themselves
    var flattened = flatten(referenced);

    // Jackson should be able to under our objects at this point, so convert it
    var jobPostings =
        mapper.convertValue(flattened, new TypeReference<List<RootObject>>() {}).stream()
            .flatMap(it -> it.graph().stream())
            .filter(it -> it instanceof JobPosting)
            .map(it -> (JobPosting) it)
            .toList();

    System.out.println(jobPostings);
  }

  private static Map<String, Object> buildLookup(Object node, Map<String, Object> lookup) {
    if (node instanceof List<?> list) {
      for (var value : list) {
        buildLookup(value, lookup);
      }
    } else if (node instanceof Map<?, ?> map) {
      for (var value : map.values()) {
        buildLookup(value, lookup);
      }
      if (map.size() > 1 && map.get("@id") instanceof String id) {
        lookup.put(id, node);
      }
    }
    return lookup;
  }

  private static Object lookupReferences(Object node, Map<String, Object> lookup) {
    if (node instanceof List<?> list
        && list.size() == 1
        && list.get(0) instanceof Map<?, ?> map
        && map.size() == 1
        && map.get("@id") instanceof String id) {
      return lookupReferences(lookup.get(id), lookup);
    }

    if (node instanceof List<?> list) {
      return list.stream().map(value -> lookupReferences(value, lookup)).toList();
    }

    if (node instanceof Map<?, ?> map) {
      return map.entrySet().stream()
          .map(entry -> Map.entry(entry.getKey(), lookupReferences(entry.getValue(), lookup)))
          .collect(toMap(Entry::getKey, Entry::getValue));
    }

    return node;
  }

  private static Object flatten(Object node) {
    if (node instanceof List<?> list && list.size() == 1) {
      if (list.get(0) instanceof String s) {
        return s;
      }
      if (list.get(0) instanceof Map<?, ?> map) {
        var value = map.get("@value");
        if (value != null) {
          return value;
        }
      }
    }

    if (node instanceof List<?> list) {
      return list.stream().map(Main::flatten).toList();
    }

    if (node instanceof Map<?, ?> map) {
      return map.entrySet().stream()
          .map(entry -> Map.entry(entry.getKey(), flatten(entry.getValue())))
          .collect(toMap(Entry::getKey, Entry::getValue));
    }

    return node;
  }
}

@JsonIgnoreProperties(ignoreUnknown = true)
record RootObject(@JsonProperty("@graph") List<GraphObject> graph) {}

@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "@type", defaultImpl = Ignored.class)
@JsonSubTypes({
  @JsonSubTypes.Type(value = JobPosting.class, name = "http://schema.org/JobPosting"),
  @JsonSubTypes.Type(value = Place.class, name = "http://schema.org/Place"),
  @JsonSubTypes.Type(value = PostalAddress.class, name = "http://schema.org/PostalAddress"),
})
interface GraphObject {}

@JsonIgnoreProperties(ignoreUnknown = true)
record Ignored() implements GraphObject {}

@JsonIgnoreProperties(ignoreUnknown = true)
record JobPosting(
    @JsonProperty("http://schema.org/title") String title,
    @JsonProperty("http://schema.org/description") String description,
    @JsonProperty("http://schema.org/hiringOrganization") String hiringOrganization,
    @JsonProperty("http://schema.org/datePosted") String datePosted,
    @JsonProperty("http://schema.org/jobLocation") Place jobLocation)
    implements GraphObject {}

@JsonIgnoreProperties(ignoreUnknown = true)
record Place(@JsonProperty("http://schema.org/address") PostalAddress address)
    implements GraphObject {}

@JsonIgnoreProperties(ignoreUnknown = true)
record PostalAddress(
    @JsonProperty("http://schema.org/addressLocality") String locality,
    @JsonProperty("http://schema.org/addressRegion") String region,
    @JsonProperty("http://schema.org/addressCountry") String country)
    implements GraphObject {}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM