Sparql如何對這類數據進行分組

Question

因為我擔心您不會理解我的情況，所以我為您制作了此視覺插圖（單擊圖像可獲得高質量的版本）。

我知道用戶（無論我們在乎什么）都喜歡項(i1) 。

我們想建議其他項目：

根據特定條件， i1與i2相似（因此有一個相似性值，我們稱其為s1 ）

i1也類似於相同的i2 ，但是取決於另一個條件（因此有一個相似性值，我們稱它為s2 ）

i1也類似於相同的i2 ，但是取決於第三個條件（因此存在相似性值，我們稱其為s3 ）

現在i2屬於兩個類別，每個類別都通過特定的權重影響相似度 。

我的問題

我是否要計算i1和i2之間的最終最終相似度，除了特定類別的權重，我幾乎完成了所有相似度。

我的問題是，不應在導致選擇i2的標准上應用此權重。 換句話說，如果使用1000條條件將i2選擇了1000次，並且i2屬於特定類別，則該類別的權重將僅應用一次，而不是1000次，並且如果i2屬於兩個類別，則兩個權重為關於導致選擇i2標准數，這兩個類將僅應用一次

現在

為了方便您幫助我，我進行了此查詢（可以，但必須很長才能向您展示情況），但我也可以通過使我的查詢僅選擇所需的信息來簡化您的工作，因此您只需可以在其上方添加另一層選擇。

    prefix : <http://example.org/rs#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>


select  ?item ?similarityValue ?finalWeight where {
  values ?i1 {:i1}
  ?i1 ?similaryTo ?item .
  ?similaryTo :hasValue ?similarityValue .
  optional{
    ?item :hasContextValue ?weight .
  }
  bind (if(bound(?weight), ?weight, 1) as ?finalWeight)
}

因此，該查詢的結果是（請看第i2項）它重復6次（按預期），具有三個不同的相似性（由於三個不同的標准而如預期的那樣），並且finalWeight （即權重）針對每個條件重復：

最后

這是數據

@prefix : <http://example.org/rs#>
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

:i1 :similaryTo1 :i2 .
:similaryTo1 :hasValue 0.5 .
:i1 :similaryTo2 :i2 .
:similaryTo2 :hasValue 0.6 .
:i1 :similaryTo3 :i2 .
:similaryTo3 :hasValue 0.7 .
:i2 :hasContextValue 0.1 .
:i2 :hasContextValue 0.4 .
:i1 :similaryTo4 :i3 .
:similaryTo4 :hasValue 0.5 .

我希望你能幫助我，我真的很感激

所以我想做什么

想象一下，根本沒有權重，所以我的查詢將是：

prefix : <http://example.org/rs#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select  ?item ?similarityValue  where {
  values ?i1 {:i1}
  ?i1 ?similaryTo ?item .
  ?similaryTo :hasValue ?similarityValue .

}

結果將是：

然后，我對相似項之和進行匯總，如下所示：

prefix : <http://example.org/rs#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select  ?item (SUM(?similarityValue) as ?sumSimilarities)  where {
  values ?i1 {:i1}
  ?i1 ?similaryTo ?item .
  ?similaryTo :hasValue ?similarityValue .
}
group by ?item

結果是：

我想要的是將此結果的每一行乘以與?item相關聯的兩個權重之和，i2為（0.1 * 0.4），i3為（1）

請注意，有些項目沒有兩個權重，有些沒有一個權重，有些沒有，並且請注意，即使對於那些具有兩個權重的項目，這兩個值也可能相同，因此如果在此處使用distinct，請小心。

最后，我僅以兩個為例進行說明，但是在現實生活中，這個數字來自動態系統。

<3>更新@Joshua Taylor回答后，我理解他的示例數據為：

Answer 1

一些數據

首先，我們可以使用一些數據。 項：a具有許多相似性連接，每個相似性都指定一個項和一個原因。 ：a可能由於某些不同的原因而與某個項目相似，甚至可能由於相同的項目和原因而重復相似之處。 我認為到目前為止，這與您的用例相符。 （問題中的樣本數據可以使這一點更加清楚，但是我認為這與您所掌握的相近）。 然后，每個項目都具有上下文值，每個原因都有一個可選的權重。

@prefix : <urn:ex:>

:a :similarTo [ :item :b ; :reason :p ] ,
              [ :item :b ; :reason :p ] , # a duplicate
              [ :item :b ; :reason :q ] ,
              [ :item :b ; :reason :r ] ,
              [ :item :c ; :reason :p ] ,
              [ :item :c ; :reason :q ] ,
              [ :item :d ; :reason :r ] ,
              [ :item :d ; :reason :s ] .

:b :context 0.01 .
:b :context 0.02 .
:c :context 0.04 .
:d :context 0.05 .
:e :context 0.06 . # not used

:p :weight 0.1 .
:q :weight 0.3 .
:r :weight 0.5 .
# no weight for :s
:t :weight 0.9 . # not used

聽起來您想要做的是計算相似項的上下文值的總和，包括每個事件的上下文值，但要對原因權重求和，但僅針對不同的事件。 如果是正確的理解，那么我認為您需要類似以下內容。

權重的原因

第一步是由於每個相似項目的不同原因而能夠獲得權重之和。

prefix : <urn:ex:>

select * where {
  values ?i { :a }

  #-- get the sum of weights of distinct reasons
  #-- for each item that is similar to ?i.
  { select ?item (sum(?weight) as ?propertyWeight) {
      #-- get the distinct properties for each ?item
      #-- along with their weights.
      { select distinct ?item ?property ?weight {
          ?i :similarTo [ :item ?item ; :reason ?property ] .
          optional { ?property :weight ?weight_ }
          bind(if(bound(?weight_), ?weight_, 0.0) as ?weight)
        } }
    }
    group by ?item
  }
}

------------------------------
| i  | item | propertyWeight |
==============================
| :a | :b   | 0.9            |
| :a | :c   | 0.4            |
| :a | :d   | 0.5            |
------------------------------

獲取物品的重量

現在，您仍然需要每個項目的值總和，計算每個事件的權重。 因此，我們擴展了查詢：

select * where {
  values ?i { :a }

  #-- get the sum of weights of distinct reasons
  #-- for each item that is similar to ?i.
  { select ?item (sum(?weight) as ?propertyWeight) {
      #-- get the distinct properties for each ?item
      #-- along with their weights.
      { select distinct ?item ?property ?weight {
          ?i :similarTo [ :item ?item ; :reason ?property ] .
          optional { ?property :weight ?weight_ }
          bind(if(bound(?weight_), ?weight_, 0.0) as ?weight)
        } }
    }
    group by ?item
  }

  #-- get the sum of the context values
  #-- for each item.
  { select ?item (sum(?context_) as ?context) {
      ?item :context ?context_ .
    }
    group by ?item
  }
}

----------------------------------------
| i  | item | propertyWeight | context |
========================================
| :a | :b   | 0.9            | 0.03    |
| :a | :c   | 0.4            | 0.04    |
| :a | :d   | 0.5            | 0.05    |
----------------------------------------

請注意，搜索？item：context？context_是可以的。 在第二個子查詢中，甚至不確保？item是類似項之一。 由於兩個子查詢的結果是結合在一起的，因此我們將僅獲得第一個子查詢還返回的？item值的結果。

放在一起

現在，您可以加，乘或做任何其他您想做的事情，以將原因權重之和與上下文值之和相結合。 例如，如果要對它們求和：

select ?i ?item ((?propertyWeight + ?context) as ?similarity) where {
  values ?i { :a }

  #-- get the sum of weights of distinct reasons
  #-- for each item that is similar to ?i.
  { select ?item (sum(?weight) as ?propertyWeight) {
      #-- get the distinct properties for each ?item
      #-- along with their weights.
      { select distinct ?item ?property ?weight {
          ?i :similarTo [ :item ?item ; :reason ?property ] .
          optional { ?property :weight ?weight_ }
          bind(if(bound(?weight_), ?weight_, 0.0) as ?weight)
        } }
    }
    group by ?item
  }

  #-- get the sum of the context values
  #-- for each item.
  { select ?item (sum(?context_) as ?context) {
      ?item :context ?context_ .
    }
    group by ?item
  }
}

--------------------------
| i  | item | similarity |
==========================
| :a | :b   | 0.93       |
| :a | :c   | 0.44       |
| :a | :d   | 0.55       |
--------------------------

最終清理

看最后一個查詢，有兩件事讓我有些煩惱。 首先是我們在內部子查詢中檢索了每個解決方案的原因權重，而對於每個項目的每個屬性，我們只需檢索一次。 也就是說，我們可以將可選部分移至外部，內部子查詢。 然后，我們有了一個綁定，該綁定設置了一個僅在聚合中使用的變量。 我們可以通過總結替換COALESCE （？重量，0.0）使用？重量 ，如果它的約束，和0.0不然。 進行了這些更改之后，我們最終得到：

select ?i ?item ((?propertyWeight + ?context) as ?similarity) where {
  values ?i { :a }

  #-- get the sum of weights of distinct properties
  #-- using 0.0 as the weight for a property that doesn't
  #-- actually specify a weight.
  { select ?item (sum(coalesce(?weight,0.0)) as ?propertyWeight) {

      #-- get the distinct properties for each ?item.
      { select distinct ?item ?property {
          ?i :similarTo [ :item ?item ; :reason ?property ] .
        } }

       #-- then get each property's optional weight.
       optional { ?property :weight ?weight }
    }
    group by ?item
  }

  #-- get the sum of the context values
  #-- for each item.
  { select ?item (sum(?context_) as ?context) {
      ?item :context ?context_ .
    }
    group by ?item
  }
}

我認為這不是一個巨大的變化，但是它使事情變得更整潔，更容易理解。

在這一點上，這幾乎成為我的口頭禪，但是，如果提供了樣本數據，這些類型的問題就容易回答了。 在這種情況下，關於如何首先獲取這些值的大多數實際機制並不重要。 之后，您將如何對它們進行匯總。 這就是為什么我們可以使用非常簡單的數據（例如我在此答案開頭重新創建的數據）的原因。

我認為，最大的收獲是使用SPARQL（我希望其他查詢語言也可以使用）的重要技術之一是具有單獨的子查詢並將其結果合並。 在這種情況下，我們最終遇到了兩個子查詢，因為我們確實需要以幾種不同的方式進行分組。 如果SPARQL提供了一個由操作符來區分的話，這本來可能更簡單，所以我們可以說

sum(distinct by(?property) ?weight)

但這是一個問題，如果一個獨特的屬性可能具有多個權重，那么您會選擇哪些權重？ 因此，解決方案實際上似乎是幾個子查詢，以便我們可以進行幾種不同的分組。 這就是為什么我要問您要計算的實際公式的原因。

Sparql如何對這類數據進行分組

問題描述

我的問題

現在

最后

所以我想做什么

1 個解決方案

解決方案1
1 已采納 2016-04-06 12:34:40

一些數據

權重的原因

獲取物品的重量

放在一起

最終清理

評論

Sparql如何對這類數據進行分組

問題描述

我的問題

現在

最后

所以我想做什么

1 個解決方案

解決方案1 1 已采納 2016-04-06 12:34:40

一些數據

權重的原因

獲取物品的重量

放在一起

最終清理

評論

解決方案1
1 已采納 2016-04-06 12:34:40