简体   繁体   English

使用SPARQL从地名检索数据

[英]Retrieving data from geonames using SPARQL

I am trying to get linked data from geonames in the following SPARQL, but obviously I'm doing someting wrong. 我正在尝试从以下SPARQL中的地理名称获取链接数据,但显然我做错了什么。

prefix oxprop: <http://ophileon.com/ox/property#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl:  <http://www.w3.org/2002/07/owl#>
prefix wgs84_pos: <http://www.w3.org/2003/01/geo/wgs84_pos#>

select ?poi ?poiname ?geonames ?latitude


from  <http://www.ophileon.com/ox/poi.rdf>
# from  <http://sws.geonames.org/ >

where
{

   ?poi rdfs:label ?poiname.
   ?poi owl:sameAs ?geonames.
#   ?geonames wgs84_pos:lat ?latitude.


  FILTER(langMatches(lang(?poiname), "EN")).

}

which, using sparql.org 's JSON output : 使用sparql.org的JSON输出:

{
  "head": {
    "vars": [ "poi" , "poiname" , "geonames" , "latitude" ]
  } ,
  "results": {
    "bindings": [
      {
        "poi": { "type": "uri" , "value": "http://ophileon.com/ox/poi/2" } ,
        "poiname": { "type": "literal" , "xml:lang": "en" , "value": "Wageningen" } ,
        "geonames": { "type": "uri" , "value": "http://sws.geonames.org/2745088" }
      } ,
      {
        "poi": { "type": "uri" , "value": "http://ophileon.com/ox/poi/3" } ,
        "poiname": { "type": "literal" , "xml:lang": "en" , "value": "Netherlands" } ,
        "geonames": { "type": "uri" , "value": "http://sws.geonames.org/2750405" }
      } ,
      {
        "poi": { "type": "uri" , "value": "http://ophileon.com/ox/poi/1" } ,
        "poiname": { "type": "literal" , "xml:lang": "en" , "value": "Amsterdam" } ,
        "geonames": { "type": "uri" , "value": "http://sws.geonames.org/2759794" }
      }
    ]
  }
}

What I want to achieve is that it retrieves the latitude of each node using the geonames rdf service with addresses like " http://sws.geonames.org/2745088/about.rdf " 我想要实现的是,它使用rdf服务的地名来检索每个节点的纬度,地址为“ http://sws.geonames.org/2745088/about.rdf

The lines starting with "#" are the ones I suspect to be incorrect.. 我怀疑以“#”开头的行是不正确的。

Next iteration 下一次迭代

After having added "/" behind the geonamesID , and running this: 在geonamesID后面添加“ /”并运行以下命令之后:

prefix oxprop: <http://ophileon.com/ox/property#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl:  <http://www.w3.org/2002/07/owl#>
prefix wgs84_pos: <http://www.w3.org/2003/01/geo/wgs84_pos#>

select *

from <http://www.ophileon.com/ox/poi.rdf>
from <http://sws.geonames.org/2745088/about.rdf>    
from <http://sws.geonames.org/2750405/about.rdf>    
from <http://sws.geonames.org/2759794/about.rdf>
where
{
   ?poi rdfs:label ?poiname.
   ?poi owl:sameAs ?geonames.
   ?geonames wgs84_pos:lat ?latitude.
   FILTER(langMatches(lang(?poiname), "EN")).
}

Returns this: 返回此:

-------------------------------------------------------------------------------------------------------
| poi                            | poiname          | geonames                           | latitude   |
=======================================================================================================
| <http://ophileon.com/ox/poi/2> | "Wageningen"@en  | <http://sws.geonames.org/2745088/> | "51.97"    |
| <http://ophileon.com/ox/poi/3> | "Netherlands"@en | <http://sws.geonames.org/2750405/> | "52.5"     |
| <http://ophileon.com/ox/poi/1> | "Amsterdam"@en   | <http://sws.geonames.org/2759794/> | "52.37403" |
-------------------------------------------------------------------------------------------------------

Next iteration : using "SERVICE" keyword 下一步:使用“ SERVICE”关键字

prefix oxprop: <http://ophileon.com/ox/property#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl:  <http://www.w3.org/2002/07/owl#>
prefix wgs84_pos: <http://www.w3.org/2003/01/geo/wgs84_pos#>


select ?poi ?poiname ?geonameuri ?latitude

from <http://www.ophileon.com/ox/poi.rdf>

where
{
   ?poi rdfs:label ?poiname.
   ?poi owl:sameAs ?geonameuri.
   SERVICE <http://factforge.net/sparql>{
   ?geonameuri wgs84_pos:lat ?latitude.
   }
   FILTER(langMatches(lang(?poiname), "EN")).
}

which results in what I wanted, except that factforge returns multiple values in various datatypes. 这导致了我想要的结果,除了factforge返回各种数据类型的多个值。
This resource http://wifo5-03.informatik.uni-mannheim.de/latc/www2012/Session%201.html proved to be very useful. 该资源http://wifo5-03.informatik.uni-mannheim.de/latc/www2012/Session%201.html证明是非常有用的。

Typos and Inability to Retrieve Data 打字错误和无法检索数据

I think there are two issues here. 我认为这里有两个问题。 The first is a minor typo. 首先是轻微的错字。 When I run your query, with the commented lines uncommented, I get a parse error because of the line 当我运行查询时,注释行未注释,由于该行,我收到解析错误

from  <http://sws.geonames.org/ >

because there should not be a space in the IRI. 因为IRI中不应有空格。 That's easy to fix though. 虽然很容易解决。 When fixed, the service at sparql.org replies that 修复后,sparql.org上的服务将答复

Error 400: Failed to load URL (parse error) http://sws.geonames.org/ : Failed to determine the triples content type: (URI=http://sws.geonames.org/ : stream=null : hint=null)

Fuseki - version 1.0.0 (Build date: 2013-09-12T10:49:49+0100)

which, I believe, means that Jena was able to pull down the content of that IRI, but wasn't able to figure out how to read it as RDF. 我认为,这意味着Jena能够提取该IRI的内容,但无法弄清楚如何将其作为RDF读取。 While a quick Google search shows plenty of queries where that IRI is used as a namespace prefix, I don't see any where it's used as a graph from which triples can be selected. 尽管快速的Google搜索会显示大量查询,这些查询将IRI用作名称空间前缀,但我看不到任何将IRI用作可从中选择三元组的图形的地方。 I think this matches what geonames.org says in its documentation : 我认为这与geonames.org在其文档中所说的相符:

Entry Points into the GeoNames Semantic Web GeoNames语义网的入口点

There are several ways how you can enter the GeoNames Semantic Web : 您可以通过几种方式进入GeoNames语义网:

  • start from mother earth and follow the Linked Data links. 大地母亲开始,并遵循链接数据链接。
  • use the geonames search webservice with the type=rdf parameter option. 地理名称搜索网络服务type = rdf参数选项一起使用。
  • download the database dump and construct the url for the features using the pattern " http://sws.geonames.org/geonameId/ " 下载数据库转储,并使用“ http://sws.geonames.org/geonameId/ ”模式构建功能的URL
  • RDF dump with 8514201 features and about 125 mio rdf triples (2013 08 27). 具有8514201功能和约125个mio rdf三元组的RDF转储 (2013 08 27)。 The dump has one rdf document per toponym on every line of the file. 转储文件的每一行上每个地名都有一个rdf文档。 Note: The file is pretty large. 注意:文件很大。 Make sure the tool you use to uncompress is able to deal with the size and does not stop after 2GB, an issue that happens with some old (windows) tool versions. 确保用于解压缩的工具能够处理大小,并且在2GB后不会停止,这是某些旧版(Windows)工具版本所发生的问题。

I'm a bit surprised to not see a SPARQL endpoint in that list, but I expect that if there was one, it would be in this list of options. 我很惊讶没有在该列表中看到SPARQL端点,但是我希望如果有一个SPARQL端点,它将出现在此选项列表中。

Modifying the query to get some data 修改查询以获取一些数据

Now, the successful query (without the commented lines) returns these results: 现在,成功的查询(无注释行)将返回以下结果:

poi                            poiname          geonames                          latitude
<http://ophileon.com/ox/poi/2> "Wageningen"@en  <http://sws.geonames.org/2745088>   
<http://ophileon.com/ox/poi/3> "Netherlands"@en <http://sws.geonames.org/2750405>   
<http://ophileon.com/ox/poi/1> "Amsterdam"@en   <http://sws.geonames.org/2759794>

Note: These were the results at the time that I started writing this answer. 注意:这些是我开始编写此答案时的结果。 However, this is based on data in http://www.ophileon.com/ox/poi.rdf , which may have changed. 但是,这是基于http://www.ophileon.com/ox/poi.rdf数据,该数据可能已更改。 On later runs of this query, I get values of geonames that have a final / , eg, http://sws.geonames.org/2745088/ . 在此查询的后续运行中,我得到具有最终/geonames值,例如http://sws.geonames.org/2745088/

Based on the same documentation, which also says that: 根据相同的文档,它还说:

For the town Embrun in France we have these two URIs: 对于法国的Embrun镇,我们有以下两个URI:

  1. http://sws.geonames.org/3020251/ http://sws.geonames.org/3020251/
  2. http://sws.geonames.org/3020251/about.rdf http://sws.geonames.org/3020251/about.rdf

The first URI [1] stands for the town in France. 第一个URI [1]代表法国的小镇。 You use this URI if you want to refer to the town. 如果要引用城镇,请使用此URI。 The second URI [2] is the document with the information geonames has about Embrun. 第二个URI [2]是包含有关Embrun的信息地理名称的文档。

This suggests that a query with those particular geonames IRIs also used as graphs names might work. 这表明具有这些特定地理名称IRI(也用作图形名称)的查询可能会起作用。 That is, that a query like this might work: 也就是说,这样的查询可能有效:

prefix oxprop: <http://ophileon.com/ox/property#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl:  <http://www.w3.org/2002/07/owl#>
prefix wgs84_pos: <http://www.w3.org/2003/01/geo/wgs84_pos#>

select ?poi ?poiname ?geonames ?latitude
from <http://www.ophileon.com/ox/poi.rdf>
from <http://sws.geonames.org/2745088/about.rdf>    
from <http://sws.geonames.org/2750405/about.rdf>    
from <http://sws.geonames.org/2759794/about.rdf>
where
{
   ?poi rdfs:label ?poiname.
   ?poi owl:sameAs ?geonames.
   ?geonames wgs84_pos:lat ?latitude.
   FILTER(langMatches(lang(?poiname), "EN")).
}

Now this still doesn't return any results, but it seems like all the data should be there. 现在这仍然不会返回任何结果,但是似乎所有数据都应该存在。 Let's try a simpler query. 让我们尝试一个更简单的查询。 If you use a query like this: 如果使用这样的查询:

select * 
from <http://sws.geonames.org/2759794/about.rdf>
where { ?s ?p ?o }

SPARQL results SPARQL结果

you'll get a bunch of triples about that place. 你会得到关于那个地方的三连串。 This does work with multiple from clauses, too. 这也适用于多个from子句。 For instance, if you use that data and your data with the following query, you get the combined results. 例如,如果您将该数据和数据与以下查询一起使用,则会得到合并的结果。

select * 
from <http://www.ophileon.com/ox/poi.rdf>
from <http://sws.geonames.org/2745088/about.rdf>  
where { ?s ?p ?o }

SPARQL results SPARQL结果

In looking at the results from that dataset, we can finally see where the problem is: the IRIs for the geonames resources end with / in their actual form, but don't have / in your data. 通过查看该数据集的结果,我们最终可以看到问题出在哪里:地名资源的IRI以/的实际形式结尾,但数据中没有/ You'll need to change your data accordingly. 您需要相应地更改数据。

Note: it seems that the data in http://www.ophileon.com/ox/poi.rdf has since been corrected. 注意: http://www.ophileon.com/ox/poi.rdf中的数据似乎已被更正。

It looks like you may end up needing to run your first query to determine data you want to get from geonames, retrieving that information, and then running a second query on that. 看来您可能最终需要运行第一个查询来确定要从地理名称中获取的数据,检索该信息,然后对它进行第二个查询。 Alternatively, you could download the big data dump provided by Geonames and use it locally (possibly the easiest solution). 或者,您可以下载Geonames提供的大数据转储并在本地使用(可能是最简单的解决方案)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM