简体   繁体   中英

Use Jena API to create a large number individuals based on OWL ontology (created on Protege)

I have a large XML document (100 Go), and want to parse it to extract informations and store them into RDF triple store.

I found how to parse a large XML file using Java, and know how to read/write RDF file using Jena RDF API.

  1. How to create instances based on classes that i implemented in an OWL ontology, created using Protege ?
  2. Is it possible to read/load this OWL ontology and create instances of classes as triples and store them into an RDF File using Jena ?

The main problem is the large number of instances (triples) created.

XML file Sample :

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>
         <name>Gaella, Matt</name>
         <initial>MG</initial>
      </author>
      <title>User Guide</title>
      <price>45.95</price>
      <publish_date>2010-10-01</publish_date>
   </book>
   <book id="bk102">
      <author>
         <name>Rall, Kimiou</name>
         <initial>KR</initial>
      </author>
      <title>Midnight Scene</title>
      <price>5.75</price>
      <publish_date>2011-12-02</publish_date>
   </book>
   <book id="bk103">
      <author>
         <name>Colin, Evian</name>
         <initial>EC</initial>
      </author>
      <title>Cool Ascendant</title>
      <price>5.50</price>
      <publish_date>2012-11-03</publish_date>
   </book>
   <book id="bk104">
      <author>
         <name>Cortes, Smith</name>
         <initial>SC</initial>
      </author>
      <title>Farmer Legacy</title>
      <price>10.50</price>
      <publish_date>2013-03-04</publish_date>
   </book>
    . . .
</catalog>

OWL-DL Ontology :

<?xml version="1.0"?>
<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:swrlb="http://www.w3.org/2003/11/swrlb#"
    xmlns="http://www.owl-ontologies.com/OntologyBooks.owl#"
    xmlns:xsp="http://www.owl-ontologies.com/2005/08/07/xsp.owl#"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:protege="http://protege.stanford.edu/plugins/owl/protege#"
    xmlns:swrl="http://www.w3.org/2003/11/swrl#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
  xml:base="http://www.owl-ontologies.com/OntologyBooks.owl">
  <owl:Ontology rdf:about=""/>
  <owl:Class rdf:ID="Book">
    <owl:disjointWith>
      <owl:Class rdf:ID="Author"/>
    </owl:disjointWith>
    <rdfs:subClassOf>
      <owl:Restriction>
        <owl:allValuesFrom>
          <owl:Class rdf:about="#Author"/>
        </owl:allValuesFrom>
        <owl:onProperty>
          <owl:ObjectProperty rdf:ID="hasAuthor"/>
        </owl:onProperty>
      </owl:Restriction>
    </rdfs:subClassOf>
    <rdfs:subClassOf>
      <owl:Restriction>
        <owl:onProperty>
          <owl:ObjectProperty rdf:about="#hasAuthor"/>
        </owl:onProperty>
        <owl:someValuesFrom>
          <owl:Class rdf:about="#Author"/>
        </owl:someValuesFrom>
      </owl:Restriction>
    </rdfs:subClassOf>
    <rdfs:subClassOf>
      <owl:Restriction>
        <owl:cardinality rdf:datatype="http://www.w3.org/2001/XMLSchema#int"
        >1</owl:cardinality>
        <owl:onProperty>
          <owl:DatatypeProperty rdf:ID="price"/>
        </owl:onProperty>
      </owl:Restriction>
    </rdfs:subClassOf>
    <rdfs:subClassOf>
      <owl:Restriction>
        <owl:cardinality rdf:datatype="http://www.w3.org/2001/XMLSchema#int"
        >1</owl:cardinality>
        <owl:onProperty>
          <owl:DatatypeProperty rdf:ID="publishDate"/>
        </owl:onProperty>
      </owl:Restriction>
    </rdfs:subClassOf>
    <rdfs:subClassOf>
      <owl:Restriction>
        <owl:onProperty>
          <owl:DatatypeProperty rdf:ID="title"/>
        </owl:onProperty>
        <owl:cardinality rdf:datatype="http://www.w3.org/2001/XMLSchema#int"
        >1</owl:cardinality>
      </owl:Restriction>
    </rdfs:subClassOf>
    <rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
  </owl:Class>
  <owl:Class rdf:about="#Author">
    <rdfs:subClassOf>
      <owl:Restriction>
        <owl:cardinality rdf:datatype="http://www.w3.org/2001/XMLSchema#int"
        >1</owl:cardinality>
        <owl:onProperty>
          <owl:DatatypeProperty rdf:ID="initial"/>
        </owl:onProperty>
      </owl:Restriction>
    </rdfs:subClassOf>
    <rdfs:subClassOf>
      <owl:Restriction>
        <owl:cardinality rdf:datatype="http://www.w3.org/2001/XMLSchema#int"
        >1</owl:cardinality>
        <owl:onProperty>
          <owl:DatatypeProperty rdf:ID="name"/>
        </owl:onProperty>
      </owl:Restriction>
    </rdfs:subClassOf>
    <rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
    <owl:disjointWith rdf:resource="#Book"/>
  </owl:Class>
  <owl:ObjectProperty rdf:ID="isAuthorOf">
    <rdfs:domain rdf:resource="#Author"/>
    <rdfs:range rdf:resource="#Book"/>
    <owl:inverseOf>
      <owl:ObjectProperty rdf:about="#hasAuthor"/>
    </owl:inverseOf>
  </owl:ObjectProperty>
  <owl:ObjectProperty rdf:about="#hasAuthor">
    <owl:inverseOf rdf:resource="#isAuthorOf"/>
    <rdfs:domain rdf:resource="#Book"/>
    <rdfs:range rdf:resource="#Author"/>
  </owl:ObjectProperty>
  <owl:DatatypeProperty rdf:about="#publishDate">
    <rdfs:domain rdf:resource="#Book"/>
    <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#date"/>
  </owl:DatatypeProperty>
  <owl:DatatypeProperty rdf:about="#price">
    <rdfs:domain rdf:resource="#Book"/>
    <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#float"/>
  </owl:DatatypeProperty>
  <owl:DatatypeProperty rdf:about="#initial">
    <rdfs:domain rdf:resource="#Author"/>
    <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
  </owl:DatatypeProperty>
  <owl:DatatypeProperty rdf:about="#name">
    <rdfs:domain rdf:resource="#Author"/>
    <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
  </owl:DatatypeProperty>
  <owl:DatatypeProperty rdf:about="#title">
    <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>
    <rdfs:domain rdf:resource="#Book"/>
  </owl:DatatypeProperty>
</rdf:RDF>

Have you considered the disk based models in Jena? I'm referring to TDB and Fuseki.

The documentation here says:

"If you wish to share a TDB dataset between multiple applications please use our Fuseki component which provides a SPARQL server that can use TDB for persistent storage and provides the SPARQL protocols for query, update and REST update over HTTP."

TDB supports very large ontologies, and you can access the stored data through a Jena model - after loading the ontology, you would then explore and add individuals this way.

Fuseki also supports SPARQL and updates - which means you could also add the individuals that way.

There is also support for exporting the stored models back to RDF files - which would provide you with the output you seek.

Regarding creating instances for classes defined in Protege, this is easy - you will find the classes declared in the RDF file, most likely in triples like

classIRI rdf:type owl:Class

You can then create instances with

instanceIRI rdf:type classIRI

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM