简体   繁体   中英

From Data mining to RDF

I have been going through Apache Jena tutorials and they are pretty straightforward. My question is, if I am doing data mining on text ex: For each paragraph in a text book I am getting people's names, places, keyphrases etc... What is the easiest way to transform these to rdf using an ontology?

Assuming you already have your entities extracted from text as strings (eg <person name>, <organization name>, <keyword1...2>, etc.), you can just use the ModelFactory in Jena to create a model and then fill it with resources using model.createResource(uri) , and properties on the resource using .addProperty() as can be found in the Jena examples and documentation. Those samples show also how to print the model out in RDF ( iterating through statements and using stmt.getSubject() , stmt.getPredicate() , and stmt.getObject() . As far as the ontology goes, you can either invent your own or more preferably, use existing vocabulary. Suppose, for example, you decide to use the Person class from schema.org. Then you'd need to specify the rdf:type of your resource to be https://schema.org/Person . Likewise, you could use properties from that vocabulary such as https://schema.org/name which inherits from https://schema.org/Thing (as all can be found in schema.org docs ). You do not necessarily need your ontology to be in your model or database so long as you are structuring your instances properly with URIs identifying classes and properties from the vocabularies or ontologies you use. If you do have the programmatic need, you can have the vocabulary in your model, but then I think you should look at Jena docs on Ontology ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM