SPARQL完全聚合上的組聚合

Question

我有一個本體，用戶可以使用五個謂詞之一來表達自己對某項物品的滿意程度。

本體包含具有稱為hasSimilarityValue的屬性的特定謂詞。

我正在嘗試執行以下操作：

讓用戶說rs：ania
提取該用戶之前已評分的所有項目。 （這很容易，因為本體已經包含了從用戶到項目的三元組）
提取與步驟2中提取的項目相似的項目，並計算它們的相似性。 （這里我們使用自己的方法來計算相似度）。 但是問題是：從第2步開始，我們已經為用戶評分了許多項目，從第2步開始，我們提取並計算了與第2步中得出的這些項目相似的項目。因此，第3步中的某個項目很可能是相似的到步驟2中的兩個（或多個）項目。因此，我們得出以下結果：
用戶：ania等級項x1用戶：ania等級項x2項y與y1相似x1項y與y2相似x2項z與z1相似x1

y1，y2和z1是介於0和1之間的值

問題是我們需要將這些值歸一化，以了解項y和項z的最終相似性。

歸一化很簡單，只需按項目分組並除以最大項目數

因此要知道與y的相似性，我應該做（y1 + y2 / 2）

要知道與z的相似性，我應該做（z1 / 2）

我的問題

如您所見，我需要對項目進行計數，然后知道此計數的最大值

這是計算所有數據而沒有歸一化部分的查詢

select  ?s  (sum(?weight * ?factor) as ?similarity) ( sum(?weight * ?factor * ?ratings) as ?similarityWithRating) (count(distinct ?x) as ?countOfItemsUsedInDeterminingTheSimilarities) where {

    values (?user) { (rs:ania) }
    values (?ratingPredict) {(rs:ratedBy4Stars)  (rs:ratedBy5Stars)}
    ?user ?ratingPredict ?x.
    ?ratingPredict rs:hasRatingValue ?ratings.
    {
      ?s ?p ?o .
      ?x ?p ?o .
      bind(4/7 as ?weight)
    }
    union
    {
      ?s ?a ?b . ?b ?p ?o .
      ?x ?c ?d . ?d ?p ?o .
      bind(1/7 as ?weight)
    }
    ?p rs:hasSimilarityValue ?factor .
      filter (?s != ?x)
  }
  group by ?s

order by ?s

結果是：

現在我需要將每一行除以count列的最大值，

我提出的解決方案是重復兩次精確查詢，一次獲得相似性，一次獲得最大值，然后加入它們，然后進行除法（歸一化）。 它正在工作，但是很難看，因為我要重復兩次相同的查詢，所以性能會很糟糕。 這是愚蠢的解決方案，我想問你們一個更好的解決方案

這是我的愚蠢解決方案

 PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rs: <http://www.musicontology.com/rs#>
PREFIX pdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

#select 
#?s   ?similarityWithRating (max(?countOfItemsUsedInDeterminingTheSimilarities) as ?maxNumberOfItemsUsedInDeterminingTheSimilarities)
#where {
 # {
select ?s ?similarity ?similarityWithRating ?countOfItemsUsedInDeterminingTheSimilarities ?maxCountOfItemsUsedInDeterminingTheSimilarities ?finalSimilarity where {
{
select  ?s  (sum(?weight * ?factor) as ?similarity) ( sum(?weight * ?factor * ?ratings) as ?similarityWithRating) (count(distinct ?x) as ?countOfItemsUsedInDeterminingTheSimilarities) where {

    values (?user) { (rs:ania) }
    values (?ratingPredict) {(rs:ratedBy4Stars)  (rs:ratedBy5Stars)}
    ?user ?ratingPredict ?x.
    ?ratingPredict rs:hasRatingValue ?ratings.
    {
      ?s ?p ?o .
      ?x ?p ?o .
      bind(4/7 as ?weight)
    }
    union
    {
      ?s ?a ?b . ?b ?p ?o .
      ?x ?c ?d . ?d ?p ?o .
      bind(1/7 as ?weight)
    }
    ?p rs:hasSimilarityValue ?factor .
      filter (?s != ?x)
  }
  group by ?s
#}

#}
#group by ?s 
order by ?s
} #end first part
{
select (Max(?countOfItemsUsedInDeterminingTheSimilarities) as ?maxCountOfItemsUsedInDeterminingTheSimilarities) where {
select  ?s  (sum(?weight * ?factor) as ?similarity) ( sum(?weight * ?factor * ?ratings) as ?similarityWithRating) (count(distinct ?x) as ?countOfItemsUsedInDeterminingTheSimilarities) where {

    values (?user) { (rs:ania) }
    values (?ratingPredict) {(rs:ratedBy4Stars)  (rs:ratedBy5Stars)}
    ?user ?ratingPredict ?x.
    ?ratingPredict rs:hasRatingValue ?ratings.
    {
      ?s ?p ?o .
      ?x ?p ?o .
      bind(4/7 as ?weight)
    }
    union
    {
      ?s ?a ?b . ?b ?p ?o .
      ?x ?c ?d . ?d ?p ?o .
      bind(1/7 as ?weight)
    }
    ?p rs:hasSimilarityValue ?factor .
      filter (?s != ?x)
  }
  group by ?s
#}

#}
#group by ?s 
order by ?s
}
}#end second part
  bind (?similarityWithRating/?maxCountOfItemsUsedInDeterminingTheSimilarities as ?finalSimilarity)
}
order by desc(?finalSimilarity)

最后

如果您想自己嘗試，請使用以下數據。 http://www.mediafire.com/view/r4qlu3uxijs4y30/musicontology

Answer 1

如果您可以在這些示例中提供最少的數據來使用，那將非常有幫助。 這意味着沒有不需要我們解決問題的東西的數據，並且這盡可能地簡單。 我認為如何創建最小，完整和可驗證的示例可能對您的堆棧溢出問題很有用。

無論如何，這里有一些簡單的數據足以供我們使用。 有兩個用戶對數據進行了評級和相似性。 注意，我指出了相似之處。 您可能希望它們是雙向的，但這並不是該問題的主要部分。

@prefix : <urn:ex:>

:user1 :rated :a , :b .

:user2 :rated :b , :c , :d .

:a :similarTo [ :piece :c ; :value 0.1 ] ,
              [ :piece :d ; :value 0.2 ] .

:b :similarTo [ :piece :d ; :value 0.3 ] ,
              [ :piece :e ; :value 0.4 ] .

:c :similarTo [ :piece :e ; :value 0.5 ] ,
              [ :piece :f ; :value 0.6 ] .

:d :similarTo [ :piece :f ; :value 0.7 ] ,
              [ :piece :g ; :value 0.8 ] .

現在，查詢只需要檢索用戶及其已評級的作品，以及相似的作品和實際的相似度值。 現在，如果按用戶和相似作品進行分組，最終將得到一個具有單個相似作品，一個用戶以及一堆額定作品及其與相似作品的相似性的組。 由於所有相似度等級都在固定范圍（0,1）中，因此您可以對它們進行平均以得到整體相似度。 在此查詢中，我還添加了一個group_concat來顯示相似度值基於哪些額定值。

prefix : <urn:ex:>

select
    ?user
    (group_concat(?piece) as ?ratedPieces)
    ?similarPiece
    (avg(?similarity_) as ?similarity)
where {
  #-- Find ?pieces that ?user has rated.
  ?user :rated ?piece .

  #-- Find other pieces (?similarPiece) that are
  #-- similar to ?piece, along with the
  #-- similarity value (?similarity_)
  ?piece :similarTo [ :piece ?similarPiece ; :value ?similarity_ ] .
}
group by ?user ?similarPiece

------------------------------------------------------------
| user   | ratedPieces         | similarPiece | similarity |
============================================================
| :user1 | "urn:ex:a"          | :c           | 0.1        | ; a-c[0.1]
| :user1 | "urn:ex:b urn:ex:a" | :d           | 0.25       | ; b-d[0.3], a-d[0.2]
| :user1 | "urn:ex:b"          | :e           | 0.4        | ; b-e[0.4]
| :user2 | "urn:ex:b"          | :d           | 0.3        | ; b-d[0.3]
| :user2 | "urn:ex:c urn:ex:b" | :e           | 0.45       | ; c-e[0.5], b-e[0.4]
| :user2 | "urn:ex:d urn:ex:c" | :f           | 0.65       | ; d-f[0.7], c-f[0.6]
| :user2 | "urn:ex:d"          | :g           | 0.8        | ; d-g[0.8]
------------------------------------------------------------

SPARQL完全聚合上的組聚合

問題描述

最后

1 個解決方案

解決方案1
2 2016-02-23 22:15:52

SPARQL完全聚合上的組聚合

問題描述

最后

1 個解決方案

解決方案1 2 2016-02-23 22:15:52

解決方案1
2 2016-02-23 22:15:52