简体   繁体   中英

How to make my sparql query with regex faster?

I have build a sparql query for dbpedia with a regex in it which is very slow :

PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia-owl:<http://dbpedia.org/ontology/>

select ?label where {
    ?s rdfs:label ?label.
    ?s dbpedia-owl:thumbnail ?photo.
    ?s dbpedia-owl:abstract ?abstract.
    FILTER langMatches( lang(?label), "FR" ).
    FILTER langMatches( lang(?abstract), "FR" ).
    FILTER regex(?label, "^Jules V", "i").

}
LIMIT 10

You can try it using the public endpoint http://fr.dbpedia.org/sparql and see you have to wait some seconds.

Is there a way for me to get better performance on this, even if the final quality is not so good ?

Thanks, Samuel

Any query using REGEX will almost certainly be slow unless your query restricts to a small enough portion of the dataset. Processing a REGEX basically requires that the store do a linear scan over the potential results checking each to see whether it matches the regular expression.

If you have a sufficiently simple regular expression as in your case you should try one of two things:

Solution 1 - Use a lighter weight string function

In your case you're looking for strings that start with a certain substring, so it will almost certainly be more efficient to use the STRSTARTS function instead since that doesn't require full regex. This of course assumes your SPARQL engine complies with the latest SPARQL 1.1 draft specification.

Solution 2 - Use Full Text Search

Many stores include full text search extensions which can be used in place of REGEX and often yield significantly better performance because you are accessing a full text index rather than doing a linear scan over the potential results.

In the case of DBPedia the Virtuoso store behind it supports the following syntax:

?label bif:contains "Jules"

Note that the Virtuoso full text syntax is somewhat limited so you can't use Jules V as is because each term must be at least 4 characters (possibly 3). But you can combine this with a further FILTER to narrow down to the results you wanted like so:

PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia-owl:<http://dbpedia.org/ontology/>

select ?label where {
    ?s rdfs:label ?label.
    ?s dbpedia-owl:thumbnail ?photo.
    ?s dbpedia-owl:abstract ?abstract.
    FILTER langMatches( lang(?label), "FR" ).
    FILTER langMatches( lang(?abstract), "FR" ).
    ?label bif:contains "Jules" .
    FILTER (CONTAINS(?label, "V"))

}
LIMIT 10

This query runs almost instantaneously

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM