简体   繁体   English

如何从dbpedia链接或URL检索XML / RDF数据?

[英]How to retrieve XML/RDF data from a dbpedia link or URL?

Recently I have been trying to learn Semantic Web. 最近我一直在努力学习语义网。 For a project I need to retrieve data from a given dbPedia link. 对于项目,我需要从给定的dbPedia链接中检索数据。 eg http://dbpedia.org/page/Berlin . 例如http://dbpedia.org/page/Berlin But when retrieve data using java.net.URLConnection I get the html data. 但是当使用java.net.URLConnection检索数据时,我得到了html数据。 How can I get the xml from the same link ? 如何从同一链接获取xml? I know that there is link in every dbpedia page to download the XML but that is not what I want to do. 我知道每个dbpedia页面都有链接来下载XML,但这不是我想要做的。 Thanks in advance. 提前致谢。

Note that the URI of the resource is actually http://dbpedia.org/resource/Berlin (with resource , not page ). 请注意,资源的URI实际上是http://dbpedia.org/resource/Berlin (带资源 ,而不是页面 )。 Ideally, you could request that URI with an Accept header of application/rdf+xml and get the RDF/XML representation of the resource. 理想情况下,您可以使用application / rdf + xml的Accept标头请求URI,并获取资源的RDF / XML表示。 That's how the BBC publishes their data (eg, see this answer ), but DBpedia doesn't do that. 这就是BBC如何发布他们的数据(例如,看到这个答案 ),但DBpedia不这样做。 Even if you request application/rdf+xml, you end up getting a redirect. 即使您请求application / rdf + xml,您最终也会获得重定向。 You can see if you try with an HTTP client. 您可以查看是否尝试使用HTTP客户端。 Eg, using Advanced Rest Client in Chrome, we get this 303 redirect: 例如,在Chrome中使用Advanced Rest Client,我们将获得此303重定向:

高级休息客户端截图

In a web browser, you get redirected to the page version by a 303 See Other response code. 在Web浏览器中,您将通过303 See Other响应代码重定向到页面版本。 Ideally, you could request the resource URI with the accept header set to application/rdf+xml and get the data, but DBpedia doesn't place quite so nicely. 理想情况下,您可以请求将accept头设置为application / rdf + xml的资源URI并获取数据,但DBpedia不能很好地放置。

So, that means that the easiest way is to note that at the bottom of http://dbpedia.org/page/Berlin , there's the text with some download links: 所以,这意味着最简单的方法是注意在http://dbpedia.org/page/Berlin的底部,有一些下载链接的文本:

RDF ( N-Triples N3/Turtle JSON XML ) RDF( N-Triples N3 / Turtle JSON XML

The URL of the last link is http://dbpedia.org/data/Berlin.rdf . 最后一个链接的URL是http://dbpedia.org/data/Berlin.rdf Thus, you can get the RDF/XML by changing page or resource to data , and appending .rdf to the end of the URL. 因此,您可以通过将页面资源更改为数据来获取RDF / XML,并将.rdf附加到URL的末尾。 It's not the most ReSTful solution, but it seems to be what's available. 它不是最ReSTful解决方案,但它似乎是可用的。

The good to access data from dbpedia is through Sparql . 从dbpedia访问数据的好处是通过Sparql You can use Apache Jena to run sparql queries against http://dbpedia.org/sparql 您可以使用Apache Jena对http://dbpedia.org/sparql运行sparql查询

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM