[英]Order of expressions in a SPARQL query
Is there any difference between the tow queries below? 以下两个拖曳查询有什么区别?
select distinct ?i
where{
?i rdf:type <http://foo/bar#A>.
FILTER EXISTS {
?i <http://foo/bar#hasB> ?b.
?b rdf:type <http://foo/bar#B1>.
}
}
select distinct ?i
where{
FILTER EXISTS {
?i <http://foo/bar#hasB> ?b.
?b rdf:type <http://foo/bar#B1>.
}
?i rdf:type <http://foo/bar#A>.
}
There are differences regarding performance or results? 在性能或结果方面有差异吗?
First, you do not need FILTER EXISTS
. 首先,您不需要
FILTER EXISTS
。 You can rewrite your query with basic graph pattern (a set of regular triple patterns). 您可以使用基本图形模式(一组常规的三重模式)重写查询。 But let's suppose you are using
FILTER NOT EXISTS
or something like. 但是,让我们假设您正在使用
FILTER NOT EXISTS
或类似的东西。
In general, order matters . 通常, 顺序很重要 。
However, top-down evaluation semantics plays role mostly in case of OPTIONAL
, and that is not your case. 但是,自上而下的评估语义主要在
OPTIONAL
情况下起作用,而事实并非如此。 Thus, results should be the same. 因此,结果应该相同。
Top-down evaluation semantics can be overridden by bottom-up evaluation semantics. 自上而下的评估语义可以被自下而上的评估语义覆盖。 Fortunately, bottom-up semantics doesn't prescribe to evaluate
FILTER
logically first though it is possible in case of FILTER EXISTS
and FILTER NOT EXISTS
. 幸运的是,尽管在
FILTER EXISTS
和FILTER NOT EXISTS
情况下,自下而上的语义并没有规定首先要对FILTER
逻辑评估。
SPARQL Algebra representation is the same for both queries: 两个查询的SPARQL代数表示形式都相同:
(prefix ((rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>)
(foobar: <http://foo/bar#>))
(distinct
(project (?i)
(filter (exists
(bgp
(triple ?i foobar:B ?b)
(triple ?b rdf:type foobar:B1)
))
(bgp (triple ?i rdf:type foobar:A))))))
Naively following top-down semantics, an engine should evaluate ?ia foobar:A
first. 天真的遵循自上而下的语义,引擎应首先评估
?ia foobar:A
。
?i
. ?i
,那么您很幸运。 ?i
whereas subpattern is much more selective. ?i
数百万个绑定,而子模式的选择性更大的话,您就不太幸运了。 Fortunately, optimizers try to reorder patterns depending on their selectivity. 幸运的是,优化程序尝试根据其选择性对模式进行重新排序。 However, predictions can be erroneous.
但是,预测可能是错误的。
By the way, the rdf:type
predicate is said to be a performance killer in Virtuoso. 顺便说一句,
rdf:type
谓词在Virtuoso中被认为是性能杀手。
Results can be different, if an endpoint has a query execution time limit and flushes partial results when timeout is reached: an example . 如果端点具有查询执行时间限制并在达到超时时刷新部分结果,则结果可能会有所不同: 例如 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.