[英]Parse json-ld generated by Apache Any23 into Java Pojo using Jackson
我想把html文本中的map結構化數據(microdata,jsonld)提取到一個Java的POJO中。 對於提取,我使用庫 Apache Any23 並配置了一個JSONLDWriter
以將 html 文檔中找到的結構化數據轉換為json-ld
格式。
這按預期工作並給了我以下 output:
[ {
"@graph" : [ {
"@id" : "_:node1gn1v4pudx1",
"@type" : [ "http://schema.org/JobPosting" ],
"http://schema.org/datePosted" : [ {
"@language" : "en-us",
"@value" : "Wed Jan 11 02:00:00 UTC 2023"
} ],
"http://schema.org/description" : [ {
"@language" : "en-us",
"@value" : "Comprehensive Job Description"
} ],
"http://schema.org/hiringOrganization" : [ {
"@language" : "en-us",
"@value" : "Org AG"
} ],
"http://schema.org/jobLocation" : [ {
"@id" : "_:node1gn1v4pudx2"
} ],
"http://schema.org/title" : [ {
"@language" : "en-us",
"@value" : "Recruiter (m/f/d)\n "
} ]
}, {
"@id" : "_:node1gn1v4pudx2",
"@type" : [ "http://schema.org/Place" ],
"http://schema.org/address" : [ {
"@id" : "_:node1gn1v4pudx3"
} ]
}, {
"@id" : "_:node1gn1v4pudx3",
"@type" : [ "http://schema.org/PostalAddress" ],
"http://schema.org/addressCountry" : [ {
"@language" : "en-us",
"@value" : "Company Country"
} ],
"http://schema.org/addressLocality" : [ {
"@language" : "en-us",
"@value" : "Company City"
} ],
"http://schema.org/addressRegion" : [ {
"@language" : "en-us",
"@value" : "Company Region"
} ]
}, {
"@id" : "https://career.company.com/job/Recruiter/",
"http://www.w3.org/1999/xhtml/microdata#item" : [ {
"@id" : "_:node1gn1v4pudx1"
} ]
} ],
"@id" : "https://career.company.com/job/Recruiter/"
} ]
接下來,我想使用 jackson 將 json-ld object 反序列化為 Java bean。POJO class 應該如下所示:
public class JobPosting {
private String datePosting;
private String hiringOrganization;
private String title;
private String description;
// Following members could be enclosed in a class too if easier
// Like class Place{private PostalAddress postalAddress;}
// private Place place;
private String addressCountry;
private String addressLocality;
private String addressRegion;
}
我想用 Jackson lib 提供的注釋來做,但我遇到了一些問題:
@type
值@value
層@id
字段持有對圖中其他對象的引用我如何將 map 這些字段正確地添加到我的 Java Pojo 中?
訣竅是使用 json-ld 處理器處理 json-ld 以獲得對開發人員更友好的 json。titanium -json-ld庫提供了此類處理器。
JsonDocument input = JsonDocument.of(jsonLdAsInputStream);
JsonObject frame = JsonLd.frame(input, URI.create("http://schema.org")).get();
上面的代碼片段通過@id 解析引用,並用給定的 IRI 解析 json 鍵。
這導致以下 output 很容易通過 Jackson 庫解析:
[{
"id": "_:b0",
"type": "JobPosting",
"datePosted": {
"@language": "en-us",
"@value": "Wed Jan 11 02:00:00 UTC 2023"
},
"description": {
"@language": "en-us",
"@value": "Comprehensive Job Description"
},
"hiringOrganization": {
"@language": "en-us",
"@value": "Org AG"
},
"jobLocation": {
"id": "_:b1",
"type": "Place",
"address": {
"id": "_:b2",
"type": "PostalAddress",
"addressCountry": {
"@language": "en-us",
"@value": "Company Country"
},
"addressLocality": {
"@language": "en-us",
"@value": "Company City"
},
"addressRegion": {
"@language": "en-us",
"@value": "Company Region"
}
}
},
"title": {
"@language": "en-us",
"@value": "Recruiter (m/f/d)\n "
}
}]
我正在使用谷歌 GSON 庫反序列化 json-ld object。示例代碼如下:
String jsonld = any23.extract(htmlDocument);
Gson gson = new Gson();
MyPojo pojo = gson.fromJson(jsonld, MyPojo.class);
那是你的選擇嗎?
否則,您將需要創建一個與 JSON-LD 文檔結構相匹配的 Java object。 然后可以使用Jackson提供的ObjectMapper將JSON-LD解析成合適的Java object。
ObjectMapper mapper = new ObjectMapper();
MyObject myObject = mapper.readValue(json-ld, MyObject.class);
MyObject class 應具有與 JSON-LD 文檔結構相匹配的字段。 例如,如果 JSON-LD 文檔有一個名為“name”的字段,那么 MyObject class 應該有一個名為“name”的 String 類型的字段。
創建相應的 Java object 后,您可以使用 ObjectMapper 將 JSON-LD 解析為 object。然后您可以訪問 object 的字段以獲取從 JSON-LD 文檔中提取的數據。
查看您對 json 感興趣的元素(例如“datePosted” 、 “hiringOrganization”值),它們總是由“ @value”標記並包含在與其名稱對應的數組中(在本例中為“http://schema” .org/datePosted”和“http://schema.org/hiringOrganization” 。這些都包含在您的 json 文件的一部分中,可以轉換為JsonNode
節點,可以通過以下方式獲取該節點:
JsonNode root = mapper.readTree(json)
.get(0)
.get("@graph")
.get(0);
因此,如果您有如下所示的 pojo:
@Data
public class JobPosting {
private String datePosted;
private String hiringOrganization;
}
並且您想檢索 datePosted 和 hiringOrganization 值,您可以檢查相對 position 在 json 文件中是否仍然相同,並且可以在 for 循環中計算:
JsonNode root = mapper.readTree(json)
.get(0)
.get("@graph")
.get(0);
String strSchema = "http://schema.org/";
String[] fieldNames = {"datePosted", "hiringOrganization"};
//creating a Map<String, String> that will be converted to the JobPosting obj
Map<String, String> map = new HashMap<>();
for (String fieldName: fieldNames) {
map.put(fieldName,
root.get(strSchema + fieldName)
.get(0)
.get("@value")
.asText()
);
}
JobPosting jobPosting = mapper.convertValue(map, JobPosting.class);
//it prints JobPosting(datePosted=Wed Jan 11 02:00:00 UTC 2023, hiringOrganization=Org AG)
System.out.println(jobPosting);
這將需要先進行一些預處理,以便在使用 Jackson 映射之前將帶有 id 指針的圖形轉換為簡化的樹:
@id
引用替換為實際對象本身,將其變成一棵樹。@value
周圍那些麻煩的對象/數組包裝器扁平化。下面的完整代碼,使用 Java 17 和一些遞歸:
package org.example;
import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
import com.fasterxml.jackson.annotation.JsonProperty;
import com.fasterxml.jackson.annotation.JsonSubTypes;
import com.fasterxml.jackson.annotation.JsonTypeInfo;
import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.File;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import static java.util.stream.Collectors.toMap;
class Main {
public static void main(String[] args) throws Exception {
var mapper = new ObjectMapper();
var node = mapper.readValue(new File("test.json"), Object.class);
// Build a lookup map of "@id" to the actual object.
var lookup = buildLookup(node, new HashMap<>());
// Replace "@id" references with the actual objects themselves instead
var referenced = lookupReferences(node, lookup);
// Flattens single object array containing "@value" to be just the "@value" themselves
var flattened = flatten(referenced);
// Jackson should be able to under our objects at this point, so convert it
var jobPostings =
mapper.convertValue(flattened, new TypeReference<List<RootObject>>() {}).stream()
.flatMap(it -> it.graph().stream())
.filter(it -> it instanceof JobPosting)
.map(it -> (JobPosting) it)
.toList();
System.out.println(jobPostings);
}
private static Map<String, Object> buildLookup(Object node, Map<String, Object> lookup) {
if (node instanceof List<?> list) {
for (var value : list) {
buildLookup(value, lookup);
}
} else if (node instanceof Map<?, ?> map) {
for (var value : map.values()) {
buildLookup(value, lookup);
}
if (map.size() > 1 && map.get("@id") instanceof String id) {
lookup.put(id, node);
}
}
return lookup;
}
private static Object lookupReferences(Object node, Map<String, Object> lookup) {
if (node instanceof List<?> list
&& list.size() == 1
&& list.get(0) instanceof Map<?, ?> map
&& map.size() == 1
&& map.get("@id") instanceof String id) {
return lookupReferences(lookup.get(id), lookup);
}
if (node instanceof List<?> list) {
return list.stream().map(value -> lookupReferences(value, lookup)).toList();
}
if (node instanceof Map<?, ?> map) {
return map.entrySet().stream()
.map(entry -> Map.entry(entry.getKey(), lookupReferences(entry.getValue(), lookup)))
.collect(toMap(Entry::getKey, Entry::getValue));
}
return node;
}
private static Object flatten(Object node) {
if (node instanceof List<?> list && list.size() == 1) {
if (list.get(0) instanceof String s) {
return s;
}
if (list.get(0) instanceof Map<?, ?> map) {
var value = map.get("@value");
if (value != null) {
return value;
}
}
}
if (node instanceof List<?> list) {
return list.stream().map(Main::flatten).toList();
}
if (node instanceof Map<?, ?> map) {
return map.entrySet().stream()
.map(entry -> Map.entry(entry.getKey(), flatten(entry.getValue())))
.collect(toMap(Entry::getKey, Entry::getValue));
}
return node;
}
}
@JsonIgnoreProperties(ignoreUnknown = true)
record RootObject(@JsonProperty("@graph") List<GraphObject> graph) {}
@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "@type", defaultImpl = Ignored.class)
@JsonSubTypes({
@JsonSubTypes.Type(value = JobPosting.class, name = "http://schema.org/JobPosting"),
@JsonSubTypes.Type(value = Place.class, name = "http://schema.org/Place"),
@JsonSubTypes.Type(value = PostalAddress.class, name = "http://schema.org/PostalAddress"),
})
interface GraphObject {}
@JsonIgnoreProperties(ignoreUnknown = true)
record Ignored() implements GraphObject {}
@JsonIgnoreProperties(ignoreUnknown = true)
record JobPosting(
@JsonProperty("http://schema.org/title") String title,
@JsonProperty("http://schema.org/description") String description,
@JsonProperty("http://schema.org/hiringOrganization") String hiringOrganization,
@JsonProperty("http://schema.org/datePosted") String datePosted,
@JsonProperty("http://schema.org/jobLocation") Place jobLocation)
implements GraphObject {}
@JsonIgnoreProperties(ignoreUnknown = true)
record Place(@JsonProperty("http://schema.org/address") PostalAddress address)
implements GraphObject {}
@JsonIgnoreProperties(ignoreUnknown = true)
record PostalAddress(
@JsonProperty("http://schema.org/addressLocality") String locality,
@JsonProperty("http://schema.org/addressRegion") String region,
@JsonProperty("http://schema.org/addressCountry") String country)
implements GraphObject {}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.