简体   繁体   中英

Custom functions in SPARQL with the Jena API

first time post here. I was hoping someone could help me make custom SPARQL functions for use within the Jena (ARQ) API. I need SPARQL to do some aggregation, and I know that it already implements avg, count, min, max, and sum, but I need to be able to do standard deviation and median as well (I also need range, but that can be done with using just min and max).

I was hoping the query could be similar to what you use for the already implemented functions:

PREFIX example: <http://www.examples.com/functions#>  
PREFIX core: <http://www.core.com/values#>  
SELECT (stddev(?price) as ?stddev)  
WHERE {  
    ?s core:hasPrice ?price  
}  

I don't know if that is possible or not, but if I need to use it like other custom functions that would be fine too, as long as it still gets the standard deviation of the results.

All I know is that the functions would be written in Java, which I already know pretty well. So, I was wondering if anyone knew of a good way to go about this or where to start looking for some guidance. I've tried looking for documentation on it, but there doesn't seem to be anything. Any help would be greatly appreciated.

Thanks in advance.

I am not sure you can do what you want without actually changing the grammar.

SUM(...) for example is a keyword defined by the SPARQL grammar:

A filter function or a property function is probably not what you are looking for.

By the way, you do not get STDDEV in SQL as well. Is it because two passes over the data are necessary?

Aggregate functions are a special case of SPARQL (and hence of ARQ) functions. I think in ARQ it is not easy to extend the set of aggregate functions, while it is easy (and documented) to extend the set of filter functions and the one of property functions. You could calculate anyway the standard deviation with something like this:

PREFIX afn: <http://jena.hpl.hp.com/ARQ/function#>
PREFIX core: <http://www.core.com/values#>  
SELECT ( afn:sqrt( sum( (?price - ?avg) * (?price - ?avg) ) / (?count - 1) ) as ?stddev )  
WHERE {
  ?s core:hasPrice ?price .
  {  
    SELECT (avg(?price) as ?avg) (count(*) as ?count) 
    WHERE {  
      ?s core:hasPrice ?price
    }
  }
}  

I'm forced anyway to use afn:sqrt that is an ARQ "proprietary" function not in the SPARQL 1.1 draft, so this query wouldn't work on frameworks different from Jena

Yes, ARQ is extensible in a variety of ways. The ARQ extensions page would be the best place to start.

ARQ allows you to add your own aggregate functions by registering them in the AggregateRegistry . The example code shows how this is done. This can be used to add the custom standard deviation aggregate function requested in the question. In the example below, Commons Math is used to do the calculation.

import org.apache.commons.math3.stat.descriptive.SummaryStatistics;
import org.apache.jena.graph.Graph;
import org.apache.jena.query.*;
import org.apache.jena.rdf.model.Model;
import org.apache.jena.rdf.model.ModelFactory;
import org.apache.jena.sparql.engine.binding.Binding;
import org.apache.jena.sparql.expr.ExprList;
import org.apache.jena.sparql.expr.NodeValue;
import org.apache.jena.sparql.expr.aggregate.Accumulator;
import org.apache.jena.sparql.expr.aggregate.AccumulatorFactory;
import org.apache.jena.sparql.expr.aggregate.AggCustom;
import org.apache.jena.sparql.expr.aggregate.AggregateRegistry;
import org.apache.jena.sparql.function.FunctionEnv;
import org.apache.jena.sparql.graph.NodeConst;
import org.apache.jena.sparql.sse.SSE;

public class StandardDeviationAggregate {
    /**
     * Custom aggregates use accumulators. One accumulator is created for each group in a query execution.
     */
    public static AccumulatorFactory factory = (agg, distinct) -> new StatsAccumulator(agg);

    private static class StatsAccumulator implements Accumulator {
        private AggCustom agg;
        private SummaryStatistics summaryStatistics = new SummaryStatistics();

        StatsAccumulator(AggCustom agg) { this.agg = agg; }

        @Override
        public void accumulate(Binding binding, FunctionEnv functionEnv) {
            // Add values to summaryStatistics
            final ExprList exprList = agg.getExprList();
            final NodeValue value = exprList.get(0).eval(binding, functionEnv) ;
            summaryStatistics.addValue(value.getDouble());
        }

        @Override
        public NodeValue getValue() {
            // Get the standard deviation
            return NodeValue.makeNodeDouble(summaryStatistics.getStandardDeviation());
        }
    }

    public static void main(String[] args) {
        // Register the aggregate function
        AggregateRegistry.register("http://example/stddev", factory, NodeConst.nodeMinusOne);

        // Add data
        Graph g = SSE.parseGraph("(graph " +
                "(:item1 :hasPrice 13) " +
                "(:item2 :hasPrice 15) " +
                "(:item3 :hasPrice 20) " +
                "(:item4 :hasPrice 30) " +
                "(:item5 :hasPrice 32) " +
                "(:item6 :hasPrice 11) " +
                "(:item7 :hasPrice 16))");

        Model m = ModelFactory.createModelForGraph(g);
        String qs = "PREFIX : <http://example/> " +
                    "SELECT (:stddev(?price) AS ?stddev) " +
                    "WHERE { ?item :hasPrice ?price }";

        // Execute query and print results
        Query q = QueryFactory.create(qs) ;
        QueryExecution qexec = QueryExecutionFactory.create(q, m);
        ResultSet rs = qexec.execSelect() ;
        ResultSetFormatter.out(rs);
    }
}

I hope this example helps someone at least, even though the question was posted a few years back.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM