简体   繁体   中英

Illegal escape sequence value when executing a SPARQL against Jena TDB query via Jena's Java API

I am running an SPARQL* query against Jena's TDB where the result set (DBPedia logs) contains escpace characters. To run the query I use org.apache.jena.query.QueryExecution like the following:

query = "SELECT  *
WHERE
  { << ?s <http://dbpedia.org/property/cover> ?o >>
              vers:valid_from   ?valid_from ;
              vers:valid_until  ?valid_until
    BIND("2022-01-05T21:42:11.803+02:00"^^xsd:dateTime AS ?TimeOfExecution)
    FILTER ( ( ?valid_from <= ?TimeOfExecution ) && ( ?TimeOfExecution < ?valid_until ) )
  }"
conn = RDFConnection.connect(String.format("http://localhost:%d/in_memory_server/sparql", server.getHttpPort()));
QueryExecution qExec = conn.query(query);

and I get the following exception:

Exception in thread "main" org.apache.jena.atlas.json.JsonParseException: illegal escape sequence value: f (0x66)
    at org.apache.jena.atlas.json.io.parser.TokenizerJSON.exception(TokenizerJSON.java:757)
    at org.apache.jena.atlas.json.io.parser.TokenizerJSON.exception(TokenizerJSON.java:749)
    at org.apache.jena.atlas.json.io.parser.TokenizerJSON.readLiteralEscape(TokenizerJSON.java:669)
    at org.apache.jena.atlas.json.io.parser.TokenizerJSON.allBetween(TokenizerJSON.java:559)
    at org.apache.jena.atlas.json.io.parser.TokenizerJSON.parseToken(TokenizerJSON.java:138)
    at org.apache.jena.atlas.json.io.parser.TokenizerJSON.hasNext(TokenizerJSON.java:75)
    at org.apache.jena.atlas.iterator.PeekIterator.fill(PeekIterator.java:50)
    at org.apache.jena.atlas.iterator.PeekIterator.next(PeekIterator.java:92)
    at org.apache.jena.atlas.json.io.parser.JSONParserBase.nextToken(JSONParserBase.java:102)
    at org.apache.jena.atlas.json.io.parser.JSONP.parseObject(JSONP.java:75)
    at org.apache.jena.atlas.json.io.parser.JSONP.parseAny(JSONP.java:97)
    at org.apache.jena.atlas.json.io.parser.JSONP.parseObject(JSONP.java:79)
    at org.apache.jena.atlas.json.io.parser.JSONP.parseAny(JSONP.java:97)
    at org.apache.jena.atlas.json.io.parser.JSONP.parseArray(JSONP.java:143)
    at org.apache.jena.atlas.json.io.parser.JSONP.parseAny(JSONP.java:98)
    at org.apache.jena.atlas.json.io.parser.JSONP.parseObject(JSONP.java:79)
    at org.apache.jena.atlas.json.io.parser.JSONP.parseAny(JSONP.java:97)
    at org.apache.jena.atlas.json.io.parser.JSONP.parseObject(JSONP.java:79)
    at org.apache.jena.atlas.json.io.parser.JSONP.parse(JSONP.java:50)
    at org.apache.jena.atlas.json.io.parser.JSONParser.parse(JSONParser.java:58)
    at org.apache.jena.atlas.json.io.parser.JSONParser.parse(JSONParser.java:40)
    at org.apache.jena.atlas.json.JSON._parse(JSON.java:126)
    at org.apache.jena.atlas.json.JSON.parse(JSON.java:38)
    at org.apache.jena.riot.resultset.rw.ResultSetReaderJSON$RS_JSON.parse(ResultSetReaderJSON.java:103)
    at org.apache.jena.riot.resultset.rw.ResultSetReaderJSON.process(ResultSetReaderJSON.java:74)
    at org.apache.jena.riot.resultset.rw.ResultSetReaderJSON.readAny(ResultSetReaderJSON.java:67)
    at org.apache.jena.riot.resultset.rw.ResultsReader.readAny(ResultsReader.java:167)
    at org.apache.jena.riot.resultset.rw.ResultsReader.readAny(ResultsReader.java:152)
    at org.apache.jena.riot.ResultSetMgr.readAny(ResultSetMgr.java:191)
    at org.apache.jena.riot.ResultSetMgr.read(ResultSetMgr.java:113)
    at org.apache.jena.sparql.exec.http.QueryExecHTTP.execRowSet(QueryExecHTTP.java:195)
    at org.apache.jena.sparql.exec.http.QueryExecHTTP.select(QueryExecHTTP.java:156)
    at org.apache.jena.sparql.exec.QueryExecutionAdapter.execSelect(QueryExecutionAdapter.java:117)
    at org.apache.jena.sparql.exec.QueryExecutionCompat.execSelect(QueryExecutionCompat.java:97)
    at org.ai.wu.ac.at.tdbArchive.core.JenaTDBArchive_TB_star_f.materializeQuery(JenaTDBArchive_TB_star_f.java:295)
    at org.ai.wu.ac.at.tdbArchive.core.JenaTDBArchive_TB_star_f.bulkAllMatQuerying(JenaTDBArchive_TB_star_f.java:258)
    at org.ai.wu.ac.at.tdbArchive.tools.JenaTDBArchive_query.main(JenaTDBArchive_query.java:266)

The?o of one row contains following problematic string:

"{\
tf1\ansi\ansicpg1252{\onttbl}
{\colortbl;\
ed255\green255\lue255;"@en

Is there any property I can set to circumvent these escape characters or to tell Jena or ARQ to use a different parser?

For some reason I do not have this problem when it is a SPARQL and not a SPARQL* query. Can this make a difference? Eg when I run following SPARQL query, which delivers exactly the same result, just from a.ng RDF dataset (quads), I get no exception:

Select * WHERE
  { GRAPH <http://example.org/versions>
      { ?graph  <http://www.w3.org/2002/07/owl#versionInfo>  92 }
    GRAPH ?graph
      { ?s  <http://dbpedia.org/property/cover>  ?o }
  }

UPDATE 1: The issues lies within the RDF dataset serialized as.ttl. To create the RDF dataset I use python. The script takes an initial snapshot and changesets as input and builds a new RDF dataset with all the changes/versions included. I use following snippet to parse and serialize the changesets:

from rdflib import Graph

cs_add = Graph() cs_add.parse("path_to_changeset")

The issue seems to be in the parser. The string that gets parsed is:

"{\\rtf1\\ansi\\ansicpg1252{\\fonttbl}\n{\\colortbl;\\red255\\green255\\blue255;"@en

Now I want to serialize this string AS-IS. I want to preserve all the special characters and they should not be escaped. This is what I get when i iterate through the triples and print the object Literal:

for s, p, o in cs_add:
    if "ansicpg1252" in o: # just to catch the string
        print(o.encode('utf-8'))
        print()
        print(o.n3())

Output

b'{\\\rtf1\\ansi\\ansicpg1252{\\\x0conttbl}\n{\\colortbl;\\\red255\\green255\\\x08lue255;'
    
"""{\\\rtf1\\ansi\\ansicpg1252{\\onttbl}
{\\colortbl;\\\red255\\green255\lue255;"""@en

So we see that eg a third backslash is added. Now I would need to find an encoding that somehow preserves the string as it is.

The problem seems not to be related to the Java Jena API but to python's rdflib. I will open another question about the specific issue with rdflib.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM