简体繁体 English

如何从dbpedia链接或URL检索XML / RDF数据？

[英]How to retrieve XML/RDF data from a dbpedia link or URL?

原文 2015-05-16 19:37:47 4 2 java/ xml/ rdf/ semantic-web

Recently I have been trying to learn Semantic Web. 最近我一直在努力学习语义网。 For a project I need to retrieve data from a given dbPedia link. 对于项目，我需要从给定的dbPedia链接中检索数据。 eg http://dbpedia.org/page/Berlin . 例如http://dbpedia.org/page/Berlin 。 But when retrieve data using java.net.URLConnection I get the html data. 但是当使用java.net.URLConnection检索数据时，我得到了html数据。 How can I get the xml from the same link ? 如何从同一链接获取xml？ I know that there is link in every dbpedia page to download the XML but that is not what I want to do. 我知道每个dbpedia页面都有链接来下载XML，但这不是我想要做的。 Thanks in advance. 提前致谢。

2 个解决方案

Note that the URI of the resource is actually http://dbpedia.org/resource/Berlin (with resource , not page ). 请注意，资源的URI实际上是http://dbpedia.org/resource/Berlin （带资源，而不是页面）。 Ideally, you could request that URI with an Accept header of application/rdf+xml and get the RDF/XML representation of the resource. 理想情况下，您可以使用application / rdf + xml的Accept标头请求URI，并获取资源的RDF / XML表示。 That's how the BBC publishes their data (eg, see this answer ), but DBpedia doesn't do that. 这就是BBC如何发布他们的数据（例如，看到这个答案），但DBpedia不这样做。 Even if you request application/rdf+xml, you end up getting a redirect. 即使您请求application / rdf + xml，您最终也会获得重定向。 You can see if you try with an HTTP client. 您可以查看是否尝试使用HTTP客户端。 Eg, using Advanced Rest Client in Chrome, we get this 303 redirect: 例如，在Chrome中使用Advanced Rest Client，我们将获得此303重定向：

高级休息客户端截图

In a web browser, you get redirected to the page version by a 303 See Other response code. 在Web浏览器中，您将通过303 See Other响应代码重定向到页面版本。 Ideally, you could request the resource URI with the accept header set to application/rdf+xml and get the data, but DBpedia doesn't place quite so nicely. 理想情况下，您可以请求将accept头设置为application / rdf + xml的资源URI并获取数据，但DBpedia不能很好地放置。

So, that means that the easiest way is to note that at the bottom of http://dbpedia.org/page/Berlin , there's the text with some download links: 所以，这意味着最简单的方法是注意在http://dbpedia.org/page/Berlin的底部，有一些下载链接的文本：

RDF ( N-Triples N3/Turtle JSON XML ) RDF（ N-Triples N3 / Turtle JSON XML ）

The URL of the last link is http://dbpedia.org/data/Berlin.rdf . 最后一个链接的URL是http://dbpedia.org/data/Berlin.rdf 。 Thus, you can get the RDF/XML by changing page or resource to data , and appending .rdf to the end of the URL. 因此，您可以通过将页面或资源更改为数据来获取RDF / XML，并将.rdf附加到URL的末尾。 It's not the most ReSTful solution, but it seems to be what's available. 它不是最ReSTful解决方案，但它似乎是可用的。

The good to access data from dbpedia is through Sparql . 从dbpedia访问数据的好处是通过Sparql 。 You can use Apache Jena to run sparql queries against http://dbpedia.org/sparql 您可以使用Apache Jena对http://dbpedia.org/sparql运行sparql查询