简体   繁体   中英

Order of expressions in a SPARQL query

Is there any difference between the tow queries below?

select distinct ?i 
where{
    ?i rdf:type <http://foo/bar#A>. 
    FILTER EXISTS {
        ?i <http://foo/bar#hasB> ?b.
        ?b rdf:type <http://foo/bar#B1>.
    }            
}


select distinct ?i 
    where{
        FILTER EXISTS {
            ?i <http://foo/bar#hasB> ?b.
            ?b rdf:type <http://foo/bar#B1>.
        }
        ?i rdf:type <http://foo/bar#A>.             
    }

There are differences regarding performance or results?

First, you do not need FILTER EXISTS . You can rewrite your query with basic graph pattern (a set of regular triple patterns). But let's suppose you are using FILTER NOT EXISTS or something like.

Results

In general, order matters .

However, top-down evaluation semantics plays role mostly in case of OPTIONAL , and that is not your case. Thus, results should be the same.

Top-down evaluation semantics can be overridden by bottom-up evaluation semantics. Fortunately, bottom-up semantics doesn't prescribe to evaluate FILTER logically first though it is possible in case of FILTER EXISTS and FILTER NOT EXISTS .

SPARQL Algebra representation is the same for both queries:

(prefix ((rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>)
         (foobar: <http://foo/bar#>))
  (distinct
    (project (?i)
      (filter (exists
                 (bgp
                   (triple ?i foobar:B ?b)
                   (triple ?b rdf:type foobar:B1)
                 ))
        (bgp (triple ?i rdf:type foobar:A))))))

Performance

Naively following top-down semantics, an engine should evaluate ?ia foobar:A first.

  • You are lucky, if there exists only one binding for ?i .
  • You are not so lucky, if there exist millions of bindings for ?i whereas subpattern is much more selective.

Fortunately, optimizers try to reorder patterns depending on their selectivity. However, predictions can be erroneous.

By the way, the rdf:type predicate is said to be a performance killer in Virtuoso.

Results vs Performance

Results can be different, if an endpoint has a query execution time limit and flushes partial results when timeout is reached: an example .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM