[英]Parse json-ld generated by Apache Any23 into Java Pojo using Jackson
我想把html文本中的map结构化数据(microdata,jsonld)提取到一个Java的POJO中。 对于提取,我使用库 Apache Any23 并配置了一个JSONLDWriter
以将 html 文档中找到的结构化数据转换为json-ld
格式。
这按预期工作并给了我以下 output:
[ {
"@graph" : [ {
"@id" : "_:node1gn1v4pudx1",
"@type" : [ "http://schema.org/JobPosting" ],
"http://schema.org/datePosted" : [ {
"@language" : "en-us",
"@value" : "Wed Jan 11 02:00:00 UTC 2023"
} ],
"http://schema.org/description" : [ {
"@language" : "en-us",
"@value" : "Comprehensive Job Description"
} ],
"http://schema.org/hiringOrganization" : [ {
"@language" : "en-us",
"@value" : "Org AG"
} ],
"http://schema.org/jobLocation" : [ {
"@id" : "_:node1gn1v4pudx2"
} ],
"http://schema.org/title" : [ {
"@language" : "en-us",
"@value" : "Recruiter (m/f/d)\n "
} ]
}, {
"@id" : "_:node1gn1v4pudx2",
"@type" : [ "http://schema.org/Place" ],
"http://schema.org/address" : [ {
"@id" : "_:node1gn1v4pudx3"
} ]
}, {
"@id" : "_:node1gn1v4pudx3",
"@type" : [ "http://schema.org/PostalAddress" ],
"http://schema.org/addressCountry" : [ {
"@language" : "en-us",
"@value" : "Company Country"
} ],
"http://schema.org/addressLocality" : [ {
"@language" : "en-us",
"@value" : "Company City"
} ],
"http://schema.org/addressRegion" : [ {
"@language" : "en-us",
"@value" : "Company Region"
} ]
}, {
"@id" : "https://career.company.com/job/Recruiter/",
"http://www.w3.org/1999/xhtml/microdata#item" : [ {
"@id" : "_:node1gn1v4pudx1"
} ]
} ],
"@id" : "https://career.company.com/job/Recruiter/"
} ]
接下来,我想使用 jackson 将 json-ld object 反序列化为 Java bean。POJO class 应该如下所示:
public class JobPosting {
private String datePosting;
private String hiringOrganization;
private String title;
private String description;
// Following members could be enclosed in a class too if easier
// Like class Place{private PostalAddress postalAddress;}
// private Place place;
private String addressCountry;
private String addressLocality;
private String addressRegion;
}
我想用 Jackson lib 提供的注释来做,但我遇到了一些问题:
@type
值@value
层@id
字段持有对图中其他对象的引用我如何将 map 这些字段正确地添加到我的 Java Pojo 中?
诀窍是使用 json-ld 处理器处理 json-ld 以获得对开发人员更友好的 json。titanium -json-ld库提供了此类处理器。
JsonDocument input = JsonDocument.of(jsonLdAsInputStream);
JsonObject frame = JsonLd.frame(input, URI.create("http://schema.org")).get();
上面的代码片段通过@id 解析引用,并用给定的 IRI 解析 json 键。
这导致以下 output 很容易通过 Jackson 库解析:
[{
"id": "_:b0",
"type": "JobPosting",
"datePosted": {
"@language": "en-us",
"@value": "Wed Jan 11 02:00:00 UTC 2023"
},
"description": {
"@language": "en-us",
"@value": "Comprehensive Job Description"
},
"hiringOrganization": {
"@language": "en-us",
"@value": "Org AG"
},
"jobLocation": {
"id": "_:b1",
"type": "Place",
"address": {
"id": "_:b2",
"type": "PostalAddress",
"addressCountry": {
"@language": "en-us",
"@value": "Company Country"
},
"addressLocality": {
"@language": "en-us",
"@value": "Company City"
},
"addressRegion": {
"@language": "en-us",
"@value": "Company Region"
}
}
},
"title": {
"@language": "en-us",
"@value": "Recruiter (m/f/d)\n "
}
}]
我正在使用谷歌 GSON 库反序列化 json-ld object。示例代码如下:
String jsonld = any23.extract(htmlDocument);
Gson gson = new Gson();
MyPojo pojo = gson.fromJson(jsonld, MyPojo.class);
那是你的选择吗?
否则,您将需要创建一个与 JSON-LD 文档结构相匹配的 Java object。 然后可以使用Jackson提供的ObjectMapper将JSON-LD解析成合适的Java object。
ObjectMapper mapper = new ObjectMapper();
MyObject myObject = mapper.readValue(json-ld, MyObject.class);
MyObject class 应具有与 JSON-LD 文档结构相匹配的字段。 例如,如果 JSON-LD 文档有一个名为“name”的字段,那么 MyObject class 应该有一个名为“name”的 String 类型的字段。
创建相应的 Java object 后,您可以使用 ObjectMapper 将 JSON-LD 解析为 object。然后您可以访问 object 的字段以获取从 JSON-LD 文档中提取的数据。
查看您对 json 感兴趣的元素(例如“datePosted” 、 “hiringOrganization”值),它们总是由“ @value”标记并包含在与其名称对应的数组中(在本例中为“http://schema” .org/datePosted”和“http://schema.org/hiringOrganization” 。这些都包含在您的 json 文件的一部分中,可以转换为JsonNode
节点,可以通过以下方式获取该节点:
JsonNode root = mapper.readTree(json)
.get(0)
.get("@graph")
.get(0);
因此,如果您有如下所示的 pojo:
@Data
public class JobPosting {
private String datePosted;
private String hiringOrganization;
}
并且您想检索 datePosted 和 hiringOrganization 值,您可以检查相对 position 在 json 文件中是否仍然相同,并且可以在 for 循环中计算:
JsonNode root = mapper.readTree(json)
.get(0)
.get("@graph")
.get(0);
String strSchema = "http://schema.org/";
String[] fieldNames = {"datePosted", "hiringOrganization"};
//creating a Map<String, String> that will be converted to the JobPosting obj
Map<String, String> map = new HashMap<>();
for (String fieldName: fieldNames) {
map.put(fieldName,
root.get(strSchema + fieldName)
.get(0)
.get("@value")
.asText()
);
}
JobPosting jobPosting = mapper.convertValue(map, JobPosting.class);
//it prints JobPosting(datePosted=Wed Jan 11 02:00:00 UTC 2023, hiringOrganization=Org AG)
System.out.println(jobPosting);
这将需要先进行一些预处理,以便在使用 Jackson 映射之前将带有 id 指针的图形转换为简化的树:
@id
引用替换为实际对象本身,将其变成一棵树。@value
周围那些麻烦的对象/数组包装器扁平化。下面的完整代码,使用 Java 17 和一些递归:
package org.example;
import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
import com.fasterxml.jackson.annotation.JsonProperty;
import com.fasterxml.jackson.annotation.JsonSubTypes;
import com.fasterxml.jackson.annotation.JsonTypeInfo;
import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.File;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import static java.util.stream.Collectors.toMap;
class Main {
public static void main(String[] args) throws Exception {
var mapper = new ObjectMapper();
var node = mapper.readValue(new File("test.json"), Object.class);
// Build a lookup map of "@id" to the actual object.
var lookup = buildLookup(node, new HashMap<>());
// Replace "@id" references with the actual objects themselves instead
var referenced = lookupReferences(node, lookup);
// Flattens single object array containing "@value" to be just the "@value" themselves
var flattened = flatten(referenced);
// Jackson should be able to under our objects at this point, so convert it
var jobPostings =
mapper.convertValue(flattened, new TypeReference<List<RootObject>>() {}).stream()
.flatMap(it -> it.graph().stream())
.filter(it -> it instanceof JobPosting)
.map(it -> (JobPosting) it)
.toList();
System.out.println(jobPostings);
}
private static Map<String, Object> buildLookup(Object node, Map<String, Object> lookup) {
if (node instanceof List<?> list) {
for (var value : list) {
buildLookup(value, lookup);
}
} else if (node instanceof Map<?, ?> map) {
for (var value : map.values()) {
buildLookup(value, lookup);
}
if (map.size() > 1 && map.get("@id") instanceof String id) {
lookup.put(id, node);
}
}
return lookup;
}
private static Object lookupReferences(Object node, Map<String, Object> lookup) {
if (node instanceof List<?> list
&& list.size() == 1
&& list.get(0) instanceof Map<?, ?> map
&& map.size() == 1
&& map.get("@id") instanceof String id) {
return lookupReferences(lookup.get(id), lookup);
}
if (node instanceof List<?> list) {
return list.stream().map(value -> lookupReferences(value, lookup)).toList();
}
if (node instanceof Map<?, ?> map) {
return map.entrySet().stream()
.map(entry -> Map.entry(entry.getKey(), lookupReferences(entry.getValue(), lookup)))
.collect(toMap(Entry::getKey, Entry::getValue));
}
return node;
}
private static Object flatten(Object node) {
if (node instanceof List<?> list && list.size() == 1) {
if (list.get(0) instanceof String s) {
return s;
}
if (list.get(0) instanceof Map<?, ?> map) {
var value = map.get("@value");
if (value != null) {
return value;
}
}
}
if (node instanceof List<?> list) {
return list.stream().map(Main::flatten).toList();
}
if (node instanceof Map<?, ?> map) {
return map.entrySet().stream()
.map(entry -> Map.entry(entry.getKey(), flatten(entry.getValue())))
.collect(toMap(Entry::getKey, Entry::getValue));
}
return node;
}
}
@JsonIgnoreProperties(ignoreUnknown = true)
record RootObject(@JsonProperty("@graph") List<GraphObject> graph) {}
@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "@type", defaultImpl = Ignored.class)
@JsonSubTypes({
@JsonSubTypes.Type(value = JobPosting.class, name = "http://schema.org/JobPosting"),
@JsonSubTypes.Type(value = Place.class, name = "http://schema.org/Place"),
@JsonSubTypes.Type(value = PostalAddress.class, name = "http://schema.org/PostalAddress"),
})
interface GraphObject {}
@JsonIgnoreProperties(ignoreUnknown = true)
record Ignored() implements GraphObject {}
@JsonIgnoreProperties(ignoreUnknown = true)
record JobPosting(
@JsonProperty("http://schema.org/title") String title,
@JsonProperty("http://schema.org/description") String description,
@JsonProperty("http://schema.org/hiringOrganization") String hiringOrganization,
@JsonProperty("http://schema.org/datePosted") String datePosted,
@JsonProperty("http://schema.org/jobLocation") Place jobLocation)
implements GraphObject {}
@JsonIgnoreProperties(ignoreUnknown = true)
record Place(@JsonProperty("http://schema.org/address") PostalAddress address)
implements GraphObject {}
@JsonIgnoreProperties(ignoreUnknown = true)
record PostalAddress(
@JsonProperty("http://schema.org/addressLocality") String locality,
@JsonProperty("http://schema.org/addressRegion") String region,
@JsonProperty("http://schema.org/addressCountry") String country)
implements GraphObject {}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.