简体   繁体   中英

Jena's getLocalName doesn't return numeric localname with Turtle

According to the changelog , the Turtle RDF serialization has supported numeric local names since August 2011. In the following Jena code, the result of getLocalName() on the URI http://www.foo.com/123456 is not 123456 . Is this a bug in Jena?

String turtle = "<http://www.foo.com/123456>  a  <http://www.foo.com/number>";
Model model = ModelFactory.createDefaultModel()
    .read(new ByteArrayInputStream(turtle.getBytes(StandardCharsets.UTF_8)), null, "TURTLE");

Resource foo = model.listSubjects().next();
String localName = foo.getLocalName();
assert localName.equals("123456");

The code does not indicate a bug. The Turtle serialization may allow numeric local names but Jena's getLocalName() "returns the name of this resource within its namespace." That's a bit of an underspecification, since it doesn't say what "its namespace" is. However, there's a bit of historical context, too. In the earlier RDF standards, from 2004, RDF/XML was the most common format. In light of that, it's not surprising that the implementation of getLocalName() in Node_URI uses Util.spltNamespace , which is based on XML's concept of local names. Util.splitNamespace 's documentation refers to finding an NCName , which is an XML concept:

Given an absolute URI, determine the split point between the namespace part and the localname part. If there is no valid localname part then the length of the string is returned.

The algorithm tries to find the longest NCName at the end of the uri, not immediately preceded by the first colon in the string.

@param uri
@return the index of the first character of the localname

Now, there is one other possible confusion here that it's important to address. RDF is an abstract data representation. An RDF graph (or model, as Jena calls them) is just a set of triples. Turtle, N3, N-Triples, and RDF/XML are just serialization formats for RDF. A Jena model can be serialized in lots of different formats, but it doesn't keep track of what serialization format its contents were read in from. (Indeed, you can populate a model without reading triples from any file at all.) That means that even though Jena will be able to read a Turtle file containing content like:

@prefix : <http://example.org/>.
:12345 a :number .

the model won't know the IRI http://www.example.org/123456 appeared in the file as :123456 . It's worth noting, as AndyS pointed out in the comments, that Jena's Turtle serialization will recognize that the IRI http://example.org/123456 can be written as :123456 , and will use that shortened version.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM