Sparql如何对这类数据进行分组

Question

因为我担心您不会理解我的情况，所以我为您制作了此视觉插图（单击图像可获得高质量的版本）。

我知道用户（无论我们在乎什么）都喜欢项(i1) 。

我们想建议其他项目：

根据特定条件， i1与i2相似（因此有一个相似性值，我们称其为s1 ）

i1也类似于相同的i2 ，但是取决于另一个条件（因此有一个相似性值，我们称它为s2 ）

i1也类似于相同的i2 ，但是取决于第三个条件（因此存在相似性值，我们称其为s3 ）

现在i2属于两个类别，每个类别都通过特定的权重影响相似度 。

我的问题

我是否要计算i1和i2之间的最终最终相似度，除了特定类别的权重，我几乎完成了所有相似度。

我的问题是，不应在导致选择i2的标准上应用此权重。 换句话说，如果使用1000条条件将i2选择了1000次，并且i2属于特定类别，则该类别的权重将仅应用一次，而不是1000次，并且如果i2属于两个类别，则两个权重为关于导致选择i2标准数，这两个类将仅应用一次

现在

为了方便您帮助我，我进行了此查询（可以，但必须很长才能向您展示情况），但我也可以通过使我的查询仅选择所需的信息来简化您的工作，因此您只需可以在其上方添加另一层选择。

    prefix : <http://example.org/rs#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>


select  ?item ?similarityValue ?finalWeight where {
  values ?i1 {:i1}
  ?i1 ?similaryTo ?item .
  ?similaryTo :hasValue ?similarityValue .
  optional{
    ?item :hasContextValue ?weight .
  }
  bind (if(bound(?weight), ?weight, 1) as ?finalWeight)
}

因此，该查询的结果是（请看第i2项）它重复6次（按预期），具有三个不同的相似性（由于三个不同的标准而如预期的那样），并且finalWeight （即权重）针对每个条件重复：

最后

这是数据

@prefix : <http://example.org/rs#>
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

:i1 :similaryTo1 :i2 .
:similaryTo1 :hasValue 0.5 .
:i1 :similaryTo2 :i2 .
:similaryTo2 :hasValue 0.6 .
:i1 :similaryTo3 :i2 .
:similaryTo3 :hasValue 0.7 .
:i2 :hasContextValue 0.1 .
:i2 :hasContextValue 0.4 .
:i1 :similaryTo4 :i3 .
:similaryTo4 :hasValue 0.5 .

我希望你能帮助我，我真的很感激

所以我想做什么

想象一下，根本没有权重，所以我的查询将是：

prefix : <http://example.org/rs#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select  ?item ?similarityValue  where {
  values ?i1 {:i1}
  ?i1 ?similaryTo ?item .
  ?similaryTo :hasValue ?similarityValue .

}

结果将是：

然后，我对相似项之和进行汇总，如下所示：

prefix : <http://example.org/rs#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select  ?item (SUM(?similarityValue) as ?sumSimilarities)  where {
  values ?i1 {:i1}
  ?i1 ?similaryTo ?item .
  ?similaryTo :hasValue ?similarityValue .
}
group by ?item

结果是：

我想要的是将此结果的每一行乘以与?item相关联的两个权重之和，i2为（0.1 * 0.4），i3为（1）

请注意，有些项目没有两个权重，有些没有一个权重，有些没有，并且请注意，即使对于那些具有两个权重的项目，这两个值也可能相同，因此如果在此处使用distinct，请小心。

最后，我仅以两个为例进行说明，但是在现实生活中，这个数字来自动态系统。

<3>更新@Joshua Taylor回答后，我理解他的示例数据为：

Answer 1

一些数据

首先，我们可以使用一些数据。 项：a具有许多相似性连接，每个相似性都指定一个项和一个原因。 ：a可能由于某些不同的原因而与某个项目相似，甚至可能由于相同的项目和原因而重复相似之处。 我认为到目前为止，这与您的用例相符。 （问题中的样本数据可以使这一点更加清楚，但是我认为这与您所掌握的相近）。 然后，每个项目都具有上下文值，每个原因都有一个可选的权重。

@prefix : <urn:ex:>

:a :similarTo [ :item :b ; :reason :p ] ,
              [ :item :b ; :reason :p ] , # a duplicate
              [ :item :b ; :reason :q ] ,
              [ :item :b ; :reason :r ] ,
              [ :item :c ; :reason :p ] ,
              [ :item :c ; :reason :q ] ,
              [ :item :d ; :reason :r ] ,
              [ :item :d ; :reason :s ] .

:b :context 0.01 .
:b :context 0.02 .
:c :context 0.04 .
:d :context 0.05 .
:e :context 0.06 . # not used

:p :weight 0.1 .
:q :weight 0.3 .
:r :weight 0.5 .
# no weight for :s
:t :weight 0.9 . # not used

听起来您想要做的是计算相似项的上下文值的总和，包括每个事件的上下文值，但要对原因权重求和，但仅针对不同的事件。 如果是正确的理解，那么我认为您需要类似以下内容。

权重的原因

第一步是由于每个相似项目的不同原因而能够获得权重之和。

prefix : <urn:ex:>

select * where {
  values ?i { :a }

  #-- get the sum of weights of distinct reasons
  #-- for each item that is similar to ?i.
  { select ?item (sum(?weight) as ?propertyWeight) {
      #-- get the distinct properties for each ?item
      #-- along with their weights.
      { select distinct ?item ?property ?weight {
          ?i :similarTo [ :item ?item ; :reason ?property ] .
          optional { ?property :weight ?weight_ }
          bind(if(bound(?weight_), ?weight_, 0.0) as ?weight)
        } }
    }
    group by ?item
  }
}

------------------------------
| i  | item | propertyWeight |
==============================
| :a | :b   | 0.9            |
| :a | :c   | 0.4            |
| :a | :d   | 0.5            |
------------------------------

获取物品的重量

现在，您仍然需要每个项目的值总和，计算每个事件的权重。 因此，我们扩展了查询：

select * where {
  values ?i { :a }

  #-- get the sum of weights of distinct reasons
  #-- for each item that is similar to ?i.
  { select ?item (sum(?weight) as ?propertyWeight) {
      #-- get the distinct properties for each ?item
      #-- along with their weights.
      { select distinct ?item ?property ?weight {
          ?i :similarTo [ :item ?item ; :reason ?property ] .
          optional { ?property :weight ?weight_ }
          bind(if(bound(?weight_), ?weight_, 0.0) as ?weight)
        } }
    }
    group by ?item
  }

  #-- get the sum of the context values
  #-- for each item.
  { select ?item (sum(?context_) as ?context) {
      ?item :context ?context_ .
    }
    group by ?item
  }
}

----------------------------------------
| i  | item | propertyWeight | context |
========================================
| :a | :b   | 0.9            | 0.03    |
| :a | :c   | 0.4            | 0.04    |
| :a | :d   | 0.5            | 0.05    |
----------------------------------------

请注意，搜索？item：context？context_是可以的。 在第二个子查询中，甚至不确保？item是类似项之一。 由于两个子查询的结果是结合在一起的，因此我们将仅获得第一个子查询还返回的？item值的结果。

放在一起

现在，您可以加，乘或做任何其他您想做的事情，以将原因权重之和与上下文值之和相结合。 例如，如果要对它们求和：

select ?i ?item ((?propertyWeight + ?context) as ?similarity) where {
  values ?i { :a }

  #-- get the sum of weights of distinct reasons
  #-- for each item that is similar to ?i.
  { select ?item (sum(?weight) as ?propertyWeight) {
      #-- get the distinct properties for each ?item
      #-- along with their weights.
      { select distinct ?item ?property ?weight {
          ?i :similarTo [ :item ?item ; :reason ?property ] .
          optional { ?property :weight ?weight_ }
          bind(if(bound(?weight_), ?weight_, 0.0) as ?weight)
        } }
    }
    group by ?item
  }

  #-- get the sum of the context values
  #-- for each item.
  { select ?item (sum(?context_) as ?context) {
      ?item :context ?context_ .
    }
    group by ?item
  }
}

--------------------------
| i  | item | similarity |
==========================
| :a | :b   | 0.93       |
| :a | :c   | 0.44       |
| :a | :d   | 0.55       |
--------------------------

最终清理

看最后一个查询，有两件事让我有些烦恼。 首先是我们在内部子查询中检索了每个解决方案的原因权重，而对于每个项目的每个属性，我们只需检索一次。 也就是说，我们可以将可选部分移至外部，内部子查询。 然后，我们有了一个绑定，该绑定设置了一个仅在聚合中使用的变量。 我们可以通过总结替换COALESCE （？重量，0.0）使用？重量 ，如果它的约束，和0.0不然。 进行了这些更改之后，我们最终得到：

select ?i ?item ((?propertyWeight + ?context) as ?similarity) where {
  values ?i { :a }

  #-- get the sum of weights of distinct properties
  #-- using 0.0 as the weight for a property that doesn't
  #-- actually specify a weight.
  { select ?item (sum(coalesce(?weight,0.0)) as ?propertyWeight) {

      #-- get the distinct properties for each ?item.
      { select distinct ?item ?property {
          ?i :similarTo [ :item ?item ; :reason ?property ] .
        } }

       #-- then get each property's optional weight.
       optional { ?property :weight ?weight }
    }
    group by ?item
  }

  #-- get the sum of the context values
  #-- for each item.
  { select ?item (sum(?context_) as ?context) {
      ?item :context ?context_ .
    }
    group by ?item
  }
}

我认为这不是一个巨大的变化，但是它使事情变得更整洁，更容易理解。

在这一点上，这几乎成为我的口头禅，但是，如果提供了样本数据，这些类型的问题就容易回答了。 在这种情况下，关于如何首先获取这些值的大多数实际机制并不重要。 之后，您将如何对它们进行汇总。 这就是为什么我们可以使用非常简单的数据（例如我在此答案开头重新创建的数据）的原因。

我认为，最大的收获是使用SPARQL（我希望其他查询语言也可以使用）的重要技术之一是具有单独的子查询并将其结果合并。 在这种情况下，我们最终遇到了两个子查询，因为我们确实需要以几种不同的方式进行分组。 如果SPARQL提供了一个由操作符来区分的话，这本来可能更简单，所以我们可以说

sum(distinct by(?property) ?weight)

但这是一个问题，如果一个独特的属性可能具有多个权重，那么您会选择哪些权重？ 因此，解决方案实际上似乎是几个子查询，以便我们可以进行几种不同的分组。 这就是为什么我要问您要计算的实际公式的原因。

Sparql如何对这类数据进行分组

问题描述

我的问题

现在

最后

所以我想做什么

1 个解决方案

解决方案1
1 已采纳 2016-04-06 12:34:40

一些数据

权重的原因

获取物品的重量

放在一起

最终清理

评论

Sparql如何对这类数据进行分组

问题描述

我的问题

现在

最后

所以我想做什么

1 个解决方案

解决方案1 1 已采纳 2016-04-06 12:34:40

一些数据

权重的原因

获取物品的重量

放在一起

最终清理

评论

解决方案1
1 已采纳 2016-04-06 12:34:40