简体   繁体   English

探索性 SPARQL 查询?

[英]Exploratory SPARQL queries?

whenever I start using SQL I tend to throw a couple of exploratory statements at the database in order to understand what is available, and what form the data takes.每当我开始使用 SQL 时,我倾向于在数据库中抛出几个探索性语句,以了解可用的内容以及数据采用的形式。

eg例如

show tables

describe table

select * from table

Could anyone help me understand the way to complete a similar exploration of an RDF datastore using a SPARQL endpoint?任何人都可以帮助我理解使用 SPARQL 端点完成对 RDF 数据存储的类似探索的方法吗?

Well, the obvious first start is to look at the classes and properties present in the data.嗯,显而易见的第一个开始是查看数据中存在的类和属性。

Here is how to see what classes are being used:以下是查看正在使用哪些类的方法:

SELECT DISTINCT ?class
WHERE {
  ?s a ?class .
}
LIMIT 25
OFFSET 0

( LIMIT and OFFSET are there for paging. It is worth getting used to these especially if you are sending your query over the Internet. I'll omit them in the other examples.) LIMITOFFSET用于分页。值得习惯这些,尤其是当您通过 Internet 发送查询时。我将在其他示例中省略它们。)

a is a special SPARQL (and Notation3 / Turtle ) syntax to represent the rdf:type predicate - this links individual instances to owl:Class / rdfs:Class types (roughly equivalent to tables in SQL RDBMSes). a是一种特殊的 SPARQL(和Notation3 / Turtle )语法来表示rdf:type谓词 - 这将单个实例链接到owl:Class / rdfs:Class类型(大致相当于 SQL RDBMS 中的表)。

Secondly, you want to look at the properties.其次,您要查看属性。 You can do this either by using the classes you've searched for or just looking for properties.您可以通过使用您搜索过的类或仅查找属性来执行此操作。 Let's just get all the properties out of the store:让我们从商店中取出所有属性:

SELECT DISTINCT ?property
WHERE {
  ?s ?property ?o .
}

This will get all the properties, which you probably aren't interested in. This is equivalent to a list of all the row columns in SQL, but without any grouping by the table.这将获得您可能不感兴趣的所有属性。这相当于 SQL 中所有行列的列表,但没有按表进行任何分组。

More useful is to see what properties are being used by instances that declare a particular class:更有用的是查看声明特定类的实例正在使用哪些属性:

SELECT DISTINCT ?property
WHERE {
  ?s a <http://xmlns.com/foaf/0.1/Person>;
     ?property ?o .
}

This will get you back the properties used on any instances that satisfy the first triple - namely, that have the rdf:type of http://xmlns.com/foaf/0.1/Person .这将使您返回在满足第一个三元组的任何实例上使用的属性 - 即具有http://xmlns.com/foaf/0.1/Personrdf:type

Remember, because a rdf:Resource can have multiple rdf:type properties - classes if you will - and because RDF's data model is additive, you don't have a diamond problem.记住,因为一个 rdf:Resource 可以有多个 rdf:type 属性——如果你愿意,可以有类——而且因为 RDF 的数据模型是可加的,所以你没有菱形问题。 The type is just another property - it's just a useful social agreement to say that some things are persons or dogs or genes or football teams.类型只是另一种属性 - 说某些事物是人或狗或基因或足球队只是一种有用的社会协议。 It doesn't mean that the data store is going to contain properties usually associated with that type.这并不意味着数据存储将包含通常与该类型关联的属性。 The type doesn't guarantee anything in terms of what properties a resource might have.该类型不保证资源可能具有的属性。

You need to familiarise yourself with the data model and the use of SPARQL's UNION and OPTIONAL syntax.您需要熟悉数据模型以及 SPARQL 的 UNION 和 OPTIONAL 语法的使用。 The rough mapping of rdf:type to SQL tables is just that - rough. rdf:type 到 SQL 表的粗略映射就是这样 - 粗略的。

You might want to know what kind of entity the property is pointing to.您可能想知道该属性指向哪种实体。 Firstly, you probably want to know about datatype properties - equivalent to literals or primitives.首先,您可能想了解数据类型属性 - 相当于文字或原语。 You know, strings, integers, etc. RDF defines these literals as all inheriting from string.您知道,字符串、整数等。RDF 将这些文字定义为都继承自字符串。 We can filter out just those properties that are literals using the SPARQL filter method isLiteral :我们可以使用 SPARQL 过滤器方法isLiteral过滤掉那些是文字的属性:

SELECT DISTINCT ?property
WHERE {
  ?s a <http://xmlns.com/foaf/0.1/Person>;
     ?property ?o .
  FILTER isLiteral(?o)
}

We are here only going to get properties that have as their object a literal - a string, date-time, boolean, or one of the other XSD datatypes.我们在这里只获取以文字作为对象的属性——字符串、日期时间、布尔值或其他 XSD 数据类型之一。

But what about the non-literal objects?但是非文字对象呢? Consider this very simple pseudo-Java class definition as an analogy:考虑这个非常简单的伪 Java 类定义作为类比:

public class Person {
    int age;
    Person marriedTo;
}

Using the above query, we would get back the literal that would represent age if the age property is bound.使用上面的查询,如果绑定了 age 属性,我们将取回表示年龄的文字。 But marriedTo isn't a primitive (ie a literal in RDF terms) - it's a reference to another object - in RDF/OWL terminology, that's an object property.但是已婚的不是原始的(即 RDF 术语中的文字)——它是对另一个对象的引用——在 RDF/OWL 术语中,这是一个对象属性。 But we don't know what sort of objects are being referred to by those properties (predicates).但是我们不知道这些属性(谓词)引用了什么样的对象。 This query will get you back properties with the accompanying types (the classes of which ?o values are members of).此查询将返回带有相关类型的属性(其中?o值是其成员的类)。

SELECT DISTINCT ?property, ?class
WHERE {
  ?s a <http://xmlns.com/foaf/0.1/Person>;
     ?property ?o .
  ?o a ?class .
  FILTER(!isLiteral(?o))
}

That should be enough to orient yourself in a particular dataset.这应该足以让自己在特定的数据集中定位。 Of course, I'd also recommend that you just pull out some individual resources and inspect them.当然,我也建议你拿出一些单独的资源并检查它们。 You can do that using the DESCRIBE query:您可以使用 DESCRIBE 查询来做到这一点:

DESCRIBE <http://example.org/resource>

There are some SPARQL tools - SNORQL , for instance - that let you do this in a browser.有一些 SPARQL 工具 - 例如SNORQL - 可以让您在浏览器中执行此操作。 The SNORQL instance I've linked to has a sample query for exploring the possible named graphs, which I haven't covered here.我链接到的 SNORQL 实例有一个示例查询,用于探索可能的命名图,这里我没有介绍。

If you are unfamiliar with SPARQL, honestly, the best resource if you get stuck is the specification.如果您不熟悉 SPARQL,老实说,如果您遇到困难,最好的资源就是规范。 It's a W3C spec but a pretty good one (they built a decent test suite so you can actually see whether implementations have done it properly or not) and if you can get over the complicated language, it is pretty helpful.这是一个 W3C 规范,但相当不错(他们构建了一个不错的测试套件,因此您可以实际查看实现是否正确完成),如果您能克服复杂的语言,它会非常有帮助。

I find the following set of exploratory queries useful:我发现以下一组探索性查询很有用:

Seeing the classes:看班级:

select distinct ?type ?label 
where { 
    ?s a ?type . 
    OPTIONAL { ?type rdfs:label ?label } 
}

Seeing the properties:查看属性:

select distinct ?objprop ?label 
where { 
    ?objprop a owl:ObjectProperty . 
    OPTIONAL { ?objprop rdfs:label ?label } 
}

Seeing the data properties:查看数据属性:

select distinct ?dataprop ?label 
where { 
    ?dataprop a owl:DatatypeProperty . 
    OPTIONAL { ?dataprop rdfs:label ?label } 
}

Seeing which properties are actually used:查看实际使用了哪些属性:

select distinct ?p ?label 
where { 
    ?s ?p ?o . 
    OPTIONAL { ?p rdfs:label ?label } 
}

Seeing what entities are asserted:查看断言了哪些实体:

select distinct ?entity ?elabel ?type ?tlabel 
where { 
    ?entity a ?type . 
    OPTIONAL { ?entity rdfs:label ?elabel } . 
    OPTIONAL { ?type rdfs:label ?tlabel } 
}

Seeing the distinct graphs in use:查看正在使用的不同图形:

select distinct ?g where { 
    graph ?g { 
        ?s ?p ?o 
    } 
}
SELECT DISTINCT * WHERE {
  ?s ?p ?o
}
LIMIT 10

I often refer to this list of queries from the voiD project .我经常参考voiD 项目中的这个查询列表 They are mainly of a statistical nature, but not only.它们主要具有统计性​​质,但不仅如此。 It shouldn't be hard to remove the COUNTs from some statements to get the actual values.从某些语句中删除 COUNT 以获得实际值应该不难。

Especially with large datasets, it is important to distinguish the pattern from the noise and to understand which structures are used a lot and which are rare.特别是对于大型数据集,重要的是将模式与噪声区分开来,并了解哪些结构经常使用,哪些很少使用。 Instead of SELECT DISTINCT , I use aggregation queries to count the major classes, predicates etc. For example, here's how to see the most important predicates in your dataset:我使用聚合查询来计算主要类、谓词等,而不是SELECT DISTINCT 。例如,以下是查看数据集中最重要谓词的方法:

SELECT ?pred (COUNT(*) as ?triples)
WHERE {
    ?s ?pred ?o .
}
GROUP BY ?pred
ORDER BY DESC(?triples)
LIMIT 100

I usually start by listing the graphs in a repository and their sizes, then look at classes (again with counts) in the graph(s) of interest, then the predicates of the class(es) I am interested in, etc.我通常首先列出存储库中的及其大小,然后查看感兴趣的图中的(再次使用计数),然后是我感兴趣的类的谓词,等等。

Of course these selectors can be combined and restricted if appropriate.当然,如果合适,这些选择器可以组合和限制。 To see what predicates are defined for instances of type foaf:Person , and break this down by graph, you could use this:要查看为foaf:Person类型的实例定义了哪些谓词,并按图对其进行分解,您可以使用以下命令:

SELECT ?g ?pred (COUNT(*) as ?triples)
WHERE {
    GRAPH ?g {
       ?s a foaf:Person .
       ?s ?pred ?o .
}
GROUP BY ?g ?pred
ORDER BY ?g DESC(?triples)

This will list each graph with the predicates in it, in descending order of frequency.这将按频率降序列出每个图形及其中的谓词。

现在(大约 10 年后:) )一个非常好的工具,它列出了最常见的类、属性和个人,让您只需点击一下即可浏览和查询: http : //www.irisa.fr/LIS/ferre/sparklis /osparklis.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM