querying for subjects or predicates by regex

Question

Given this RDF:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE rdf:RDF [<!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
<!ENTITY rdfs 'http://www.w3.org/2000/01/rdf-schema#'>
<!ENTITY xsd 'http://www.w3.org/2001/XMLSchema#'>]>

<rdf:RDF xmlns:xsd="http://www.w3.org/2001/XMLSchema#" 
         xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
         xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:about="Fadi" xml:startTime="00:01:38" xml:endTime="00:01:39">
    <ns0:eat xmlns:ns0="http://example.org/">Apple</ns0:eat>
  </rdf:Description>
</rdf:RDF>

when I execute this SPARQL query

SELECT *
WHERE {
  ?s ?p ?o . 
  FILTER (regex(?o, 'Apple','i'))
}

I get the subject and predicate:

s: http://example.org/Fadi , p: http://example.org/eat .

but when I execute

SELECT *
WHERE {
  ?s ?p ?o .
  FILTER (regex(?s, 'Fadi','i'))
}

or

SELECT *
WHERE {
  ?s ?p ?o .
  FILTER (regex(?s, 'http://example.org/Fadi','i'))
}

I get nothing. How can i query for subject or predicate? How can I query about startTime and endTime ?

Answer 1

REGEX is for querying text values, not for matching against resource IRIs. You could use the str function to get the IRI of a resource, so your filter would look like

FILTER (regex( str( ?s ), 'http://example.org/Fadi','i'))

but that's really not what you want to do here. Since you are looking to retrieve triples of the form

<http://example.org/Fadi> ?p ?o

ask for them with a query like this:

SELECT *
WHERE {
  <http://example.org/Fadi> ?p ?o .
}

You can define prefixes in SPARQL queries, too, so if you're using a bunch of terms from one namespace, you can save some typing by, eg,

PREFIX ex: <http://example.org/>
SELECT * 
WHERE {
 ex:Fadi ?p ?o .
}

However, there's still another problem with your example. Your RDF document doesn't have any XML base, the IRI for Fadi in <rdf:Description rdf:about="Fadi" ... is unpredictable. A SPARQL engine might resolve it against a filename, creating, for instance /home/user/input.rdf/Fadi . Either specify an XML base, or use full IRIs for the rdf:about property. Assuming we add xml:base="http://www.example.org/" to the rdf:RDF element, we can run those queries using Jena ARQ command line tools, we get output containing the triples we expect, but also some messages about those startTime and endTime attributes:

$ arq --data fadi.rdf --query fadi.sparql 
12:13:21 WARN  riot                 :: {W118} XML attribute: xml:startTime is not known and is being discarded.
12:13:21 WARN  riot                 :: {W118} XML attribute: xml:endTime is not known and is being discarded.
----------------------------------------------------
| s                             | p      | o       |
====================================================
| <http://www.example.org/Fadi> | ex:eat | "Apple" |
----------------------------------------------------

Those property values need to specified by elements within the rdf:Description element. I don't think that xml:startTime and xml:endTime are meaningful properties; whatever start time and end time mean here, they should probably be specified by different properties, but that's a modeling issue, not a syntax issue. At any rate, we can adjust the input file accordingly to get (with the xml:base and xml:(start|end)Time elements):

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE rdf:RDF [<!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
<!ENTITY rdfs 'http://www.w3.org/2000/01/rdf-schema#'>
<!ENTITY xsd 'http://www.w3.org/2001/XMLSchema#'>]>

<rdf:RDF xmlns:xsd="http://www.w3.org/2001/XMLSchema#" 
         xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
         xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xml:base="http://www.example.org/">
  <rdf:Description rdf:about="Fadi">
    <ns0:eat xmlns:ns0="http://example.org/">Apple</ns0:eat>
    <xml:startTime>00:01:38</xml:startTime>
    <xml:endTime>00:01:39</xml:endTime>
  </rdf:Description>
</rdf:RDF>

Now when we run the query, we get

$ /usr/local/lib/apache-jena-2.10.0/bin/arq --data fadi.rdf --query fadi.sparql 
------------------------------------------------------------------------------------------------
| s                             | p                                               | o          |
================================================================================================
| <http://www.example.org/Fadi> | <http://www.w3.org/XML/1998/namespaceendTime>   | "00:01:39" |
| <http://www.example.org/Fadi> | <http://www.w3.org/XML/1998/namespacestartTime> | "00:01:38" |
| <http://www.example.org/Fadi> | ex:eat                                          | "Apple"    |
------------------------------------------------------------------------------------------------

which seems like what you wanted. More specific queries, eg, for the Fadi's start and end times, are easy to construct too. Using the startTime and endTime properties as they appear so far (even though they should be refactored into a different namespace later), we have:

PREFIX ex: <http://www.example.org/>
PREFIX xml: <http://www.w3.org/XML/1998/namespace>
SELECT *
WHERE {
  ex:Fadi xml:startTime ?start ;
          xml:endTime ?end .
}

which produces

$ /usr/local/lib/apache-jena-2.10.0/bin/arq --data fadi.rdf --query fadi.sparql 
---------------------------
| start      | end        |
===========================
| "00:01:38" | "00:01:39" |
---------------------------

Answer 2

?s is a URI and regex works on strings. Use the str function to get a string:

FILTER (regex(str(?s), 'Fadi','i'))

querying for subjects or predicates by regex

Question

2 answers

solution1
11 2013-04-27 16:24:00

solution2
5 ACCPTED 2013-04-27 16:22:39

querying for subjects or predicates by regex

Question

2 answers

solution1 11 2013-04-27 16:24:00

solution2 5 ACCPTED 2013-04-27 16:22:39

solution1
11 2013-04-27 16:24:00

solution2
5 ACCPTED 2013-04-27 16:22:39