简体   繁体   English

SPARQL查询中的表达式顺序

[英]Order of expressions in a SPARQL query

Is there any difference between the tow queries below? 以下两个拖曳查询有什么区别?

select distinct ?i 
where{
    ?i rdf:type <http://foo/bar#A>. 
    FILTER EXISTS {
        ?i <http://foo/bar#hasB> ?b.
        ?b rdf:type <http://foo/bar#B1>.
    }            
}


select distinct ?i 
    where{
        FILTER EXISTS {
            ?i <http://foo/bar#hasB> ?b.
            ?b rdf:type <http://foo/bar#B1>.
        }
        ?i rdf:type <http://foo/bar#A>.             
    }

There are differences regarding performance or results? 在性能或结果方面有差异吗?

First, you do not need FILTER EXISTS . 首先,您不需要FILTER EXISTS You can rewrite your query with basic graph pattern (a set of regular triple patterns). 您可以使用基本图形模式(一组常规的三重模式)重写查询。 But let's suppose you are using FILTER NOT EXISTS or something like. 但是,让我们假设您正在使用FILTER NOT EXISTS或类似的东西。

Results 结果

In general, order matters . 通常, 顺序很重要

However, top-down evaluation semantics plays role mostly in case of OPTIONAL , and that is not your case. 但是,自上而下的评估语义主要在OPTIONAL情况下起作用,而事实并非如此。 Thus, results should be the same. 因此,结果应该相同。

Top-down evaluation semantics can be overridden by bottom-up evaluation semantics. 自上而下的评估语义可以被自下而上的评估语义覆盖。 Fortunately, bottom-up semantics doesn't prescribe to evaluate FILTER logically first though it is possible in case of FILTER EXISTS and FILTER NOT EXISTS . 幸运的是,尽管在FILTER EXISTSFILTER NOT EXISTS情况下,自下而上的语义并没有规定首先要对FILTER逻辑评估。

SPARQL Algebra representation is the same for both queries: 两个查询的SPARQL代数表示形式都相同:

(prefix ((rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>)
         (foobar: <http://foo/bar#>))
  (distinct
    (project (?i)
      (filter (exists
                 (bgp
                   (triple ?i foobar:B ?b)
                   (triple ?b rdf:type foobar:B1)
                 ))
        (bgp (triple ?i rdf:type foobar:A))))))

Performance 性能

Naively following top-down semantics, an engine should evaluate ?ia foobar:A first. 天真的遵循自上而下的语义,引擎应首先评估?ia foobar:A

  • You are lucky, if there exists only one binding for ?i . 如果仅存在一个绑定?i ,那么您很幸运。
  • You are not so lucky, if there exist millions of bindings for ?i whereas subpattern is much more selective. 如果存在?i数百万个绑定,而子模式的选择性更大的话,您就不太幸运了。

Fortunately, optimizers try to reorder patterns depending on their selectivity. 幸运的是,优化程序尝试根据其选择性对模式进行重新排序。 However, predictions can be erroneous. 但是,预测可能是错误的。

By the way, the rdf:type predicate is said to be a performance killer in Virtuoso. 顺便说一句, rdf:type谓词在Virtuoso中被认为是性能杀手。

Results vs Performance 结果与绩效

Results can be different, if an endpoint has a query execution time limit and flushes partial results when timeout is reached: an example . 如果端点具有查询执行时间限制并在达到超时时刷新部分结果,则结果可能会有所不同: 例如

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM