简体   繁体   English

Python JSON API用于带有平面文件的链接数据

[英]Python JSON API for linked data, with flat files

We're creating gamma-cat , an open data collection for gamma-ray astronomy, and are looking for advice (here, or links to resources, formats, tools, packages) how to best set it up. 我们正在创建gamma-cat (伽马射线天文学的开放数据收集),并正在寻求有关如何进行最佳设置的建议(在此处,或指向资源,格式,工具,包的链接)。

The data we have consists of measurements for different sources, from different papers. 我们拥有的数据包括来自不同论文的不同来源的度量。 It's pretty heterogeneous, sometimes there's data for multiple sources in one paper, for each source there's usually several papers, sometimes there's no spectrum, sometimes one, sometimes many, ... 这是非常不同的,有时在一篇论文中有多个来源的数据,每个来源通常有几篇论文,有时没有频谱,有时一个,有时很多,...

Currently we just collect the data in an input folder as YAML and CSV files, and now we'd like to expose it to users. 当前,我们仅将数据作为YAML和CSV文件收集在input文件夹中,现在我们希望将其公开给用户。 Mainly access from Python, but also from Javascript and accessible from a static website. 主要从Python访问,也从Javascript访问,并且可以从静态网站访问。

The question is what format and organisation we should use for the data, and if there's any Python packages that will help us generate the output files as a set of linked data, as well as Python and Javascript packages that will help us access it? 问题是我们应该使用哪种格式和组织数据,以及是否有任何Python软件包可以帮助我们将output文件生成为一组链接数据,以及Python和Javascript软件包可以帮助我们访问数据?

We would like to get multiple "views" or simple "queries" of the data, eg "list of all sources", "list of all papers", "list of all spectra for source X", "spectrum A from paper B for source C". 我们希望获得数据的多个“视图”或简单的“查询”,例如“所有来源列表”,“所有论文列表”,“来源X的所有光谱列表”,“论文B的光谱A”源C”。

For format, probably JSON would be a good choice? 对于格式,JSON可能是一个不错的选择? Although YAML is a bit nicer to read, and it's possible to have comments and ordered maps. 尽管YAML读起来更好一些,但也可以有注释和有序的地图。 We're storing the output files in a git repo, and have had a lot of meaningless diffs for JSON files because key order changes all the time. 我们将输出文件存储在git仓库中,并且JSON文件存在很多毫无意义的差异,因为密钥顺序一直在变化。

To make the datasets discoverable and linked, I don't know what to use. 为了使数据集可发现和链接,我不知道该使用什么。 I found eg http://jsonapi.org/ but that seems to be for REST APIs, not for just a series of flat JSON files on a static webserver? 我发现了例如http://jsonapi.org/,但这似乎是针对REST API的,而不仅仅是静态Web服务器上的一系列平面JSON文件? Maybe it could still be used that way? 也许仍然可以那样使用? I also found http://json-ld.org/ which looks relevant, but also pretty complex. 我还发现http://json-ld.org/看起来很相关,但是也很复杂。 Would either of those or something else be a good choice? 这些或别的什么都是好选择吗?

And finally, we'd like to generate the linked and discoverable files in output from just a bunch of somewhat organised YAML and CSV files in input using Python scripts. 最后,我们想使用Python脚本从input的一些有点组织化的YAML和CSV文件生成output的链接的和可发现的文件。 So far we just wrote a bunch of Python classes or scripts based on Python dicts / lists and YAML / JSON files. 到目前为止,我们仅基于Python字典/列表和YAML / JSON文件编写了一堆Python类或脚本。 Is there a Python package that would help with that task of generating the linked data files? 是否有一个Python软件包可以帮助完成生成链接数据文件的任务?

Apologies for the long and complex question! 对于冗长而复杂的问题,我们深表歉意! I hope it's still in scope for SO and someone will have some advice to share. 我希望它仍适用于SO,并且有人可以分享一些建议。

Judging from the breadth of your question, you are new to linked data. 从问题的广度来看,您不熟悉链接数据。 The least "strange" format for you might be the Data Package . 最不适合您的格式可能是数据包 In the most common case it's just a zip archive of a CSV file and JSON metadata. 在最常见的情况下,它只是CSV文件和JSON元数据的zip存档。 It has a Python package . 它有一个Python包

If you have queries to the data, you should settle for a database (triplestore) with a SPARQL endpoint. 如果对数据有查询,则应使用SPARQL端点来建立数据库(三重存储)。 Take a look at Fuseki . 看看Fuseki You can then use Turtle or RDF/XML for file export. 然后,您可以使用Turtle或RDF / XML进行文件导出。

If the data comes from some kind of a tool, you can model the domain it represents using Eclipse Lyo ( tutorial ). 如果数据来自某种工具,则可以使用Eclipse Lyo( 教程 )对它表示的域进行建模。

These tools are maintained by 3 different communities, you can reach out to their user mailing lists separately if you have further questions about them. 这些工具由3个不同的社区维护,如果您对它们还有其他疑问,可以分别与他们的用户邮件列表联系。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM