How to query data from different unknown structured rdf graphs?

Question

Since a few years the amount of Linked Data has grown fast. There are different graphs published using RDF . Every graph does have it's own prefixes and vocabulary structures.

So, how is it possible to query specific entities and related data using this graphs?

Is it necessary to study the individual structure of each graph and implement it in the systems logic?

Or is there any good approach to query data with SPARQL without knowing the structure?

Answer 1

No, not really. You can't just blindly query a database, you have to know something about what's in it to come up with a sensible query to grab a slice of data you're interested in.

But lacking any knowledge of a dataset, you can fire off some very general queries to start creating the building blocks of navigation, ala

select distinct ?p where { ?s ?p ?o }

That will return every predicate used in the database. To get, roughly, all the classes:

select distinct ?t where { ?s a ?t }

Or you can combine these to get all the predicates used by each class

select distinct ?p ?t where { ?s a ?t . ?s ?p ?o }

By issuing these sorts of queries, you can begin to get a feel for what's in the database. But these are really just attempting to approximate (ie guess) what the underlying schema of the data is. So you're better off reviewing the RDF schema or OWL ontology that is associated with the data, presuming there is one. Further, these queries, given their generality, can be non-trivial to run over a database depending on the optimizations provided by the database. So you might want to consider that before firing those off to any random endpoint.

Some datasets in the LoD cloud might provide a voiD description which kind of outlines some of what you'd get from the above queries or a cursory skim of a schema, and would be enough to get you going.

Generally, you don't want to just start traversing the graph, you're better off learning about the structure of the graph and coming up with some precise queries that grab subsets of the data that you are most interested in for your application. One good thing about the LoD cloud is that a lot of the datasets overlap, to some degree, in the vocabularies they use. So armed with knowledge of common vocabs, such as FOAF or Dublin Core, you can get some mileage out of exploring. Then if you combine this with the vocabs used by parts of the cloud, you can begin to formulate queries that will work for your application.

To answer your initial question, if it is not clear by now, yes, you can query for a specific entity within the graph, all you need to know is it's URI. In fact, once you know that:

describe <uri_of_the_interesting_entity>

Will get you the relevant subset of the graph for that entity. What comes back in the describe query is dependent on what algorithm the database uses, but generally, it will at least include all the triples the thing is the subject of.

You might take some time to review the SPARQL spec if you are not already familiar with it. Good luck.

How to query data from different unknown structured rdf graphs?

Question

1 answers

solution1
5 ACCPTED 2012-11-09 14:06:59

How to query data from different unknown structured rdf graphs?

Question

1 answers

solution1 5 ACCPTED 2012-11-09 14:06:59

solution1
5 ACCPTED 2012-11-09 14:06:59