简体   繁体   English

哪个java的rdfa解析器支持当前使用的rdfa属性?

[英]Which rdfa parser for java that supports currently used rdfa attributes?

I am building an app in Java using Jena for semantic information scraping. 我正在使用Jena构建一个Java应用程序来进行语义信息抓取。 I am looking for a RDFa parser that would allow me to correctly extract all the RDFa statements. 我正在寻找一个RDFa解析器,它允许我正确提取所有RDFa语句。 Specifically, one that extracts info about namespaces used and presuming that RDFa tags are correct in the page produces correct triples, ones that distinguish between object and data properties. 具体来说,提取有关所使用的命名空间的信息并假设RDFa标记在页面中正确的信息会产生正确的三元组,即区分对象和数据属性的三元组。

I went through all RDFa parsers from the site http://rdfa.info/wiki/Consume for Java. 我浏览了网站http://rdfa.info/wiki/Consume for Java中的所有RDFa解析器。 They all struggle to extract any RDFa statements and if they do not crash, Jena RDFa parser shows plenty of errors and then dies a terrible death, the data is of little use as it is incorrectly processed and generally mixed up. 他们都努力提取任何RDFa语句,如果它们没有崩溃,Jena RDFa解析器显示大量错误然后死于可怕的死亡,数据没有多大用处,因为它处理不当并且通常混淆。 I am newbie in this area so please be gentle:) 我是这个领域的新手所以请温柔:)

I was also thinking of using a library written in different language but then again I don't really know how to plug it into Java code. 我也在考虑使用以不同语言编写的库,但我又不知道如何将其插入Java代码中。 Any suggestions? 有什么建议?

Most RDFa parsers struggle with invalid HTML. 大多数RDFa解析器都在使用无效的HTML。 The any23 library includes an RDFa parser that can deal with invalid HTML. any23库包含一个可以处理无效HTML的RDFa解析器。 It parses any RDFa into full RDF, including namespace mappings and so on, and is under active development. 它将任何RDFa解析为完整的RDF,包括命名空间映射等,并且正在积极开发中。

Use java-rdfa . 使用java-rdfa It supports jena, and uses the validator.nu html 5 parser, which handles parsing the html like a browser does (ie it will repair broken markup). 它支持jena,并使用validator.nu html 5解析器,它像浏览器一样处理解析html(即它将修复损坏的标记)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM