简体   繁体   中英

SPARQL: group subjects based on predicates

In a semantic web graph, I have a set of subjects (S1, S2, ..., Sn) and a set of predicates (P1, P2, ... , Pn). I want to group instances based on their predicates (ie select all instances that have the same set of predicates regardless of the object value).

For example, If I have

S1 P1 v1.
S1 P2 v2.
S2 P3 v3.
S2 P4 v4.
S3 P1 v5.
S3 P2 v6.

I would expect to have two groups {S1, S3} and {S2}. I am generating the graph myself, so I can change its structure if it will help achieving this requirement.

This is a bit more complex than it might sounds, and I'm not entirely sure whether it's possible in a completely general way, but I think you can achieve this in most endpoints. If you want to group based on the set of predicates that a subject has, then you first need to be able to get the set of predicates that a subject has, and in a way that can be compared with other sets of predicates. SPARQL has no notion of a set values datatype, but using group_concat and distinct , you can get a string containing all of some predicates, and if you use order by when you select them, most endpoints will keep the order intact, so that the group_concat strings are essentially canonical. However, that behavior isn't, as far as I can tell, guaranteed by the spec.

@prefix : <urn:ex:>

:S1 :P1 :v1 .
:S1 :P2 :v2 .
:S2 :P3 :v3 .
:S2 :P4 :v4 .
:S3 :P1 :v5 .
:S3 :P2 :v6 .
prefix : <urn:ex:>

#-- The behavior in most (all?) endpoints seems to be
#-- to preserve the order during the group_concat
#-- operation, so you'll get "noramlized" values
#-- for ?preds.  I don't think is *guaranteed*, though.
select ?s (group_concat(?p) as ?preds) where {
  #-- get the values of ?s and ?p and ensure that
  #-- they're in some kind of standarized order.
  #-- Just ordering by ?p might be fine, too.
  { select distinct ?s ?p {
      ?s ?p ?o
    }
    order by ?p
  }
}
group by ?s
-------------------------------
| s   | preds                 |
===============================
| :S2 | "urn:ex:P3 urn:ex:P4" |
| :S3 | "urn:ex:P1 urn:ex:P2" |
| :S1 | "urn:ex:P1 urn:ex:P2" |
-------------------------------

Now you just need to go one step farther and group these results by ?preds:

prefix : <urn:ex:>

select (group_concat(?s) as ?subjects) {
  select ?s (group_concat(?p) as ?preds) where {
    { select distinct ?s ?p {
        ?s ?p ?o
      }
      order by ?p
    }
  }
  group by ?s
}
group by ?preds
-------------------------
| subjects              |
=========================
| "urn:ex:S1 urn:ex:S3" |
| "urn:ex:S2"           |
-------------------------

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM