简体   繁体   中英

RDF4J: Parse query result from endpoint and store it as ntriples file (malformed query)

in this endpoint there is an option to get the result of a query in N-triples format. I want to do the same with the rdf4j library when connecting to the endpoint and save the result in an ntriples format file.

So far, I've used a graphQuery (CONSTRUCT):

        .....
        String queryString = prefixes +
                " CONSTRUCT { ?sub ?hasProp ?prop } WHERE { ?sub ?hasProp ?prop FILTER(?sub = yago:Naples) } ";
        GraphQuery graphQuery = con.prepareGraphQuery(QueryLanguage.SPARQL, queryString);
        RDFWriter writer = new NTriplesWriter(System.out);
        graphQuery.evaluate(writer);

Unfortunately, I get: [Malformed query result from server] (Expected '.', found '–'). In the endpoint the result is returned just fine (Ntriples format). Could this be a bug of rdf4j?

> <http://yago-knowledge.org/resource/Naples>
> <http://yago-knowledge.org/resource/linksTo>
> <http://yago-knowledge.org/resource/S.S.C._Napoli> .
> <http://yago-knowledge.org/resource/Naples>
> <http://yago-knowledge.org/resource/linksTo>
> <http://yago-knowledge.org/resource/Treno_Alta_Velocit\u00E0> .
> <http://yago-know18:50:57.014 [main] ERROR
> o.e.r.rio.helpers.ParseErrorLogger - [Rio fatal] Expected '.', found
> '–' (386, -1) org.eclipse.rdf4j.query.QueryEvaluationException:
> Malformed query result from server    at
> org.eclipse.rdf4j.repository.sparql.query.SPARQLGraphQuery.evaluate(SPARQLGraphQuery.java:69)
>   at org.example.Connect.main(Connect.java:60) Caused by:
> org.eclipse.rdf4j.repository.RepositoryException: Malformed query
> result from server    at
> org.eclipse.rdf4j.http.client.SPARQLProtocolSession.getRDF(SPARQLProtocolSession.java:934)
>   at
> org.eclipse.rdf4j.http.client.SPARQLProtocolSession.sendGraphQuery(SPARQLProtocolSession.java:463)
>   at
> org.eclipse.rdf4j.repository.sparql.query.SPARQLGraphQuery.evaluate(SPARQLGraphQuery.java:62)
>   ... 1 more Caused by: org.eclipse.rdf4j.rio.RDFParseException:
> Expected '.', found '–' [line 386]    at
> org.eclipse.rdf4j.rio.helpers.RDFParserHelper.reportFatalError(RDFParserHelper.java:403)
>   at
> org.eclipse.rdf4j.rio.helpers.AbstractRDFParser.reportFatalError(AbstractRDFParser.java:755)
>   at
> org.eclipse.rdf4j.rio.turtle.TurtleParser.reportFatalError(TurtleParser.java:1318)
>   at
> org.eclipse.rdf4j.rio.turtle.TurtleParser.verifyCharacterOrFail(TurtleParser.java:1153)
>   at
> org.eclipse.rdf4j.rio.turtle.TurtleParser.parseStatement(TurtleParser.java:241)
>   at
> org.eclipse.rdf4j.rio.turtle.TurtleParser.parse(TurtleParser.java:201)
>   at
> org.eclipse.rdf4j.rio.turtle.TurtleParser.parse(TurtleParser.java:143)
>   at
> org.eclipse.rdf4j.http.client.SPARQLProtocolSession.getRDF(SPARQLProtocolSession.java:931)
>   ... 3 more

When RDF4J's SPARQLRepository executes a SPARQL query request against this endpoint, the endpoint sends back its response in Turtle format. Unfortunately that response contains a syntax error. What happens is the following:

  1. RDF4J does a query request, indicating several acceptable result formats (including Turtle and N-Triples);
  2. The endpoint executes the query, picks Turtle as the response format, and serializes the query result in Turtle;
  3. RDF4J receives the Turtle data and parses it;
  4. the parsed result is passed to the NTriplesWriter, which then writes it out.

However, the query result document that the endpoint sends back is not syntactically valid Turtle, which causes RDF4J's Turtle parser to abort with an error, in step 3.

The problem is this line in the response (line 386):

    yago:Italian_War_of_1494–98 ,

Specifically, the character between 1494 and 98 . Although it looks like a minus sign ( - ) which would be perfectly legal, it is in fact a so-called 'en dash', (Unicode character 0x2013). This is not a legal character in a prefixed name in Turtle.

The endpoint's Turtle writer should serialize the value correctly by changing to a full URI instead of a prefixed name, and using a Unicode escape sequence, like so:

<http://yago-knowledge.org/resource/Italian_War_of_1494\u201398>

It might be worth logging a bug report with the endpoint maintainers with a proposed fix to this effect.

As a workaround, the endpoint's N-Triples output (if you force it to respond with N-Triples instead of Turtle) does seem to be syntactically correct. You can force the server to respond back with N-Triples instead of Turtle by "overwriting" the standard Accept header that RDF4J's SPARQLRepository sends, like so:

SPARQLRepository repo = new SPARQLRepository(endpoint);

// create a new map of additional http headers
Map<String, String> headers = new HashMap<String, String>();

// we set the Accept header to _only_ accept text/plain, forcing the endpoint
// to use N-Triples as the response format. This overwrites the standard
// Accept header that RDF4J sends.
headers.put("Accept", "text/plain");
repo.setAdditionalHttpHeaders(headers);

Once you do that, the rest of your code should work.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM