简体   繁体   English

使用Jena API在SPARQL中自定义函数

[英]Custom functions in SPARQL with the Jena API

first time post here. 第一次发布在这里。 I was hoping someone could help me make custom SPARQL functions for use within the Jena (ARQ) API. 我希望有人能帮助我在Jena(ARQ)API中使用自定义SPARQL函数。 I need SPARQL to do some aggregation, and I know that it already implements avg, count, min, max, and sum, but I need to be able to do standard deviation and median as well (I also need range, but that can be done with using just min and max). 我需要SPARQL做一些聚合,我知道它已经实现了avg,count,min,max和sum,但我需要能够做标准偏差和中位数(我也需要范围,但这可以是完成使用min和max)。

I was hoping the query could be similar to what you use for the already implemented functions: 我希望查询可以与您已经实现的函数使用的类似:

PREFIX example: <http://www.examples.com/functions#>  
PREFIX core: <http://www.core.com/values#>  
SELECT (stddev(?price) as ?stddev)  
WHERE {  
    ?s core:hasPrice ?price  
}  

I don't know if that is possible or not, but if I need to use it like other custom functions that would be fine too, as long as it still gets the standard deviation of the results. 我不知道这是否可能,但如果我需要像其他自定义函数一样使用它,只要它仍然得到结果的标准偏差。

All I know is that the functions would be written in Java, which I already know pretty well. 我所知道的是,函数将用Java编写,我已经很清楚了。 So, I was wondering if anyone knew of a good way to go about this or where to start looking for some guidance. 所以,我想知道是否有人知道一个好方法去做这个或从哪里开始寻找一些指导。 I've tried looking for documentation on it, but there doesn't seem to be anything. 我试过寻找文件,但似乎没有任何东西。 Any help would be greatly appreciated. 任何帮助将不胜感激。

Thanks in advance. 提前致谢。

I am not sure you can do what you want without actually changing the grammar. 我不确定你能做什么你想做而不改变语法。

SUM(...) for example is a keyword defined by the SPARQL grammar: 例如,SUM(...)是由SPARQL语法定义的关键字:

A filter function or a property function is probably not what you are looking for. 过滤器功能或属性功能可能不是您要找的。

By the way, you do not get STDDEV in SQL as well. 顺便说一下,你也没有在SQL中获得STDDEV。 Is it because two passes over the data are necessary? 是因为需要两次传递数据吗?

Aggregate functions are a special case of SPARQL (and hence of ARQ) functions. 聚合函数是SPARQL(以及ARQ)函数的特例。 I think in ARQ it is not easy to extend the set of aggregate functions, while it is easy (and documented) to extend the set of filter functions and the one of property functions. 我认为在ARQ中扩展聚合函数集并不容易,而扩展过滤函数集和属性函数集很容易(并且有文档记录)。 You could calculate anyway the standard deviation with something like this: 无论如何你可以用这样的东西来计算标准偏差:

PREFIX afn: <http://jena.hpl.hp.com/ARQ/function#>
PREFIX core: <http://www.core.com/values#>  
SELECT ( afn:sqrt( sum( (?price - ?avg) * (?price - ?avg) ) / (?count - 1) ) as ?stddev )  
WHERE {
  ?s core:hasPrice ?price .
  {  
    SELECT (avg(?price) as ?avg) (count(*) as ?count) 
    WHERE {  
      ?s core:hasPrice ?price
    }
  }
}  

I'm forced anyway to use afn:sqrt that is an ARQ "proprietary" function not in the SPARQL 1.1 draft, so this query wouldn't work on frameworks different from Jena 我无论如何都被迫使用afn:sqrt这是ARQ“专有”函数,而不是在SPARQL 1.1草案中,因此这个查询不适用于与Jena不同的框架

Yes, ARQ is extensible in a variety of ways. 是的,ARQ可以通过多种方式进行扩展。 The ARQ extensions page would be the best place to start. ARQ扩展页面将是最佳起点。

ARQ allows you to add your own aggregate functions by registering them in the AggregateRegistry . ARQ允许您通过在AggregateRegistry注册它们来添加自己的聚合函数。 The example code shows how this is done. 示例代码显示了如何完成此操作。 This can be used to add the custom standard deviation aggregate function requested in the question. 这可用于添加问题中请求的自定义标准偏差聚合函数。 In the example below, Commons Math is used to do the calculation. 在下面的示例中,使用Commons Math进行计算。

import org.apache.commons.math3.stat.descriptive.SummaryStatistics;
import org.apache.jena.graph.Graph;
import org.apache.jena.query.*;
import org.apache.jena.rdf.model.Model;
import org.apache.jena.rdf.model.ModelFactory;
import org.apache.jena.sparql.engine.binding.Binding;
import org.apache.jena.sparql.expr.ExprList;
import org.apache.jena.sparql.expr.NodeValue;
import org.apache.jena.sparql.expr.aggregate.Accumulator;
import org.apache.jena.sparql.expr.aggregate.AccumulatorFactory;
import org.apache.jena.sparql.expr.aggregate.AggCustom;
import org.apache.jena.sparql.expr.aggregate.AggregateRegistry;
import org.apache.jena.sparql.function.FunctionEnv;
import org.apache.jena.sparql.graph.NodeConst;
import org.apache.jena.sparql.sse.SSE;

public class StandardDeviationAggregate {
    /**
     * Custom aggregates use accumulators. One accumulator is created for each group in a query execution.
     */
    public static AccumulatorFactory factory = (agg, distinct) -> new StatsAccumulator(agg);

    private static class StatsAccumulator implements Accumulator {
        private AggCustom agg;
        private SummaryStatistics summaryStatistics = new SummaryStatistics();

        StatsAccumulator(AggCustom agg) { this.agg = agg; }

        @Override
        public void accumulate(Binding binding, FunctionEnv functionEnv) {
            // Add values to summaryStatistics
            final ExprList exprList = agg.getExprList();
            final NodeValue value = exprList.get(0).eval(binding, functionEnv) ;
            summaryStatistics.addValue(value.getDouble());
        }

        @Override
        public NodeValue getValue() {
            // Get the standard deviation
            return NodeValue.makeNodeDouble(summaryStatistics.getStandardDeviation());
        }
    }

    public static void main(String[] args) {
        // Register the aggregate function
        AggregateRegistry.register("http://example/stddev", factory, NodeConst.nodeMinusOne);

        // Add data
        Graph g = SSE.parseGraph("(graph " +
                "(:item1 :hasPrice 13) " +
                "(:item2 :hasPrice 15) " +
                "(:item3 :hasPrice 20) " +
                "(:item4 :hasPrice 30) " +
                "(:item5 :hasPrice 32) " +
                "(:item6 :hasPrice 11) " +
                "(:item7 :hasPrice 16))");

        Model m = ModelFactory.createModelForGraph(g);
        String qs = "PREFIX : <http://example/> " +
                    "SELECT (:stddev(?price) AS ?stddev) " +
                    "WHERE { ?item :hasPrice ?price }";

        // Execute query and print results
        Query q = QueryFactory.create(qs) ;
        QueryExecution qexec = QueryExecutionFactory.create(q, m);
        ResultSet rs = qexec.execSelect() ;
        ResultSetFormatter.out(rs);
    }
}

I hope this example helps someone at least, even though the question was posted a few years back. 我希望这个例子至少可以帮助某人,即使这个问题是几年前发布的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM