简体   繁体   English

如何使用Java中的XPath访问OWL文档?

[英]How to access OWL documents using XPath in Java?

I am having an OWL document in the form of an XML file. 我有一个XML文件形式的OWL文档。 I want to extract elements from this document. 我想从这个文档中提取元素。 My code works for simple XML documents, but it does not work with OWL XML documents. 我的代码适用于简单的XML文档,但它不适用于OWL XML文档。

I was actually looking to get this element: /rdf:RDF/owl:Ontology/rdfs:label , for which I did this: 我实际上是想获得这个元素: /rdf:RDF/owl:Ontology/rdfs:label ,我这样做了:

 DocumentBuilder builder = builderfactory.newDocumentBuilder();
    Document xmlDocument = builder.parse(
            new File(XpathMain.class.getResource("person.xml").getFile()));

    XPathFactory factory = javax.xml.xpath.XPathFactory.newInstance();
    XPath xPath = factory.newXPath();
    XPathExpression xPathExpression = xPath.compile("/rdf:RDF/owl:Ontology/rdfs:label/text()");
    String nameOfTheBook = xPathExpression.evaluate(xmlDocument,XPathConstants.STRING).toString();

I also tried extracting only the rdfs:label element this way: 我也试过这样只提取rdfs:label元素:

 XPathExpression xPathExpression = xPath.compile("//rdfs:label");        
 NodeList nodes = (NodeList) xPathExpression.evaluate(xmlDocument, XPathConstants.NODESET);

But this nodelist is empty. 但是这个节点列表是空的。

Please let me know where I am going wrong. 请让我知道我哪里出错了。 I am using Java XPath API. 我正在使用Java XPath API。

Don't query RDF (or OWL) with XPath 不要使用XPath查询RDF(或OWL)

There's already an accepted answer, but I wanted to elaborate on @Michael's comment on the question. 已经有一个公认的答案,但我想详细说明@ Michael对此问题的评论 It's a very bad idea to try to work with RDF as XML (and hence, the RDF serialization of an OWL ontology), and the reason for that is very simple: the same RDF graph can be serialized as lots of different XML documents. 尝试使用RDF作为XML(因此,OWL本体的RDF序列化)是一个非常糟糕的主意,其原因很简单:相同的RDF图可以序列化为许多不同的XML文档。 In the question, all that's being asked for the is rdfs:label of an owl:Ontology element, so how much could go wrong? 在这个问题中,所有被要求的是一个owl:Ontology元素的rdfs:label ,那么多少可能出错? Well, here are two serializations of the ontology. 好吧,这是本体的两个序列化。

The first is fairly human readable, and was generated by the OWL API when I saved the ontology using the Protégé ontology editor. 第一个是人类可读的,当我使用Protégé本体编辑器保存本体时,由OWL API生成。 The query in the accepted answer would work on this, I think. 我认为,接受的答案中的查询对此有用。

<rdf:RDF xmlns="http://www.example.com/labelledOnt#"
     xml:base="http://www.example.com/labelledOnt"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:owl="http://www.w3.org/2002/07/owl#"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <owl:Ontology rdf:about="http://www.example.com/labelledOnt">
        <rdfs:label>Here is a label on the Ontology.</rdfs:label>
    </owl:Ontology>
</rdf:RDF>

Here is the same RDF graph using fewer of the fancy features available in the RDF/XML encoding. 这是使用RDF / XML编码中较少的花哨功能的相同 RDF图。 This is the same RDF graph , and thus the same OWL ontology. 这是相同的RDF图 ,因此是相同的 OWL本体。 However, there is no owl:Ontology XML element here, and the XPath query will fail. 但是,这里没有 owl:Ontology XML元素,XPath查询将失败。

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
    xmlns="http://www.example.com/labelledOnt#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" > 
  <rdf:Description rdf:about="http://www.example.com/labelledOnt">
    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Ontology"/>
    <rdfs:label>Here is a label on the Ontology.</rdfs:label>
  </rdf:Description>
</rdf:RDF>

You cannot reliably query an RDF graph in RDF/XML serialization by using typical XML-processing techniques. 无法使用典型的XML处理技术在RDF / XML序列化中可靠地查询RDF图。

Query RDF with SPARQL 使用SPARQL查询RDF

Well, if we cannot query reliably query RDF with XPath, what are we supposed to use? 好吧,如果我们无法用XPath可靠地查询RDF,我们应该使用什么? The standard query language for RDF is SPARQL . RDF的标准查询语言是SPARQL RDF is a graph-based representation, and SPARQL queries include graph patterns that can match a graph. RDF是基于图形的表示,SPARQL查询包括可以匹配图形的图形模式。

In this case, the pattern that we want to match in a graph consists of two triples. 在这种情况下,我们想要在图表中匹配的模式由两个三元组组成。 A triple is a 3-tuple of the form [subject,predicate,object] . 三元组是[subject,predicate,object]形式的三元组。 Both triples have the same subject. 两个三元组都有相同的主题。

  • The first triple says that the subject is of type owl:Ontology . 第一个三元组说主题是owl:Ontology类型。 The relationship “is of type” is rdf:type , so the first triple is [?something,rdf:type,owl:Ontology] . 关系“是类型”是rdf:type ,所以第一个三元组是[?something,rdf:type,owl:Ontology]
  • The second triple says that subject (now known to be an ontology) has an rdfs:label , and that's the value that we're interested in. The corresponding triple is [?something,rdfs:label,?label] . 第二个三元组说主题(现在称为本体)有一个rdfs:label ,这就是我们感兴趣的值。相应的三元组是[?something,rdfs:label,?label]

In SPARQL, after defining the necessary prefixes, we can write the following query. 在SPARQL中,在定义必要的前缀后,我们可以编写以下查询。

PREFIX owl: <http://www.w3.org/2002/07/owl#>                                                                                                                                                   
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>                                                                                                                                           

SELECT ?label WHERE {                                                                                                                                                                          
  ?ontology a owl:Ontology ;                                                                                                                                                                   
            rdfs:label ?label .                                                                                                                                                                
}

(Note that because rdf:type is so common, SPARQL includes a as an abbreviation for it. The notation s p1 o1; p2 o2 . is just shorthand for the two-triple pattern s p1 o1 . s p2 o2 . .) (注意,因为rdf:type是如此常见,所以SPARQL包含a作为它的缩写。符号s p1 o1; p2 o2 .只是两个三重模式s p1 o1 . s p2 o2 .缩写s p1 o1 . s p2 o2 . 。。)

You can run SPARQL queries against your model in Jena either programmatically, or using the command line tools. 您可以通过编程方式或使用命令行工具对Jena中的模型运行SPARQL查询。 If you do it programmatically, it is fairly easy to get the results out. 如果以编程方式执行此操作,则很容易获得结果。 To confirm that this query gets the value we're interested in, we can use Jena's command line for arq to test it out. 为了确认此查询获得我们感兴趣的值,我们可以使用Jena的arq命令行来测试它。

$ arq  --data labelledOnt.owl --query getLabel.sparql
--------------------------------------
| label                              |
======================================
| "Here is a label on the Ontology." |
--------------------------------------

as xpath does not know the namespaces you are using. 因为xpath不知道您正在使用的命名空间。 try using: 尝试使用:

"/*[local-name()='RDF']/*[local-name()='Ontology']/*[local-name()='label']/text()"

local name will ignore the namespaces and will work (for the first instance of this that it finds) 本地名称将忽略命名空间并将起作用(对于它找到的第一个实例)

You would be able to use namespaces in query if you implement javax.xml.namespace.NamespaceContext for yourself. 如果您自己实现javax.xml.namespace.NamespaceContext ,则可以在查询中使用命名空间。 Please have a look at this answer https://stackoverflow.com/a/5466030/1443529 , this explains how to get it done. 请看一下这个答案https://stackoverflow.com/a/5466030/1443529 ,这解释了如何完成它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM