简体   繁体   English

在R中打开JSON文件

[英]Opening JSON files in R

I have downloaded some data from the following site as a zip file and extracted it onto my computer. 我已经从以下站点以zip文件的形式下载了一些数据,并将其提取到我的计算机上。 Now, I'm having trouble trying to open the included json data files. 现在,我在尝试打开包含的json数据文件时遇到了麻烦。

Running following code: 运行以下代码:

install.packages("rjson")
library("rjson")
comp <- fromJSON("statsbomb/data/competitions")

gave this error: 给了这个错误:

Error in fromJSON("statsbomb/data/competitions") : unexpected character 's' fromJSON(“ statsbomb / data / competitions”)中的错误:意外字符's'

Also, is there a way to load all files at once instead of writing individual statements each time? 另外,有没有一种方法可以一次加载所有文件,而不是每次都编写单个语句?

Here is what I did(Unix system). 这是我所做的(Unix系统)。

  1. Clone the Github repo(mark location) 克隆Github仓库(标记位置)
git clone https://github.com/statsbomb/open-data.git

  1. Set working directory(directory to which you cloned the repo or extracted the zip file). 设置工作目录(克隆存储库或提取zip文件的目录)。
setwd("path to directory where you cloned the repo")

  1. Read data. 读取数据。
  jsonlite::fromJSON("competitions.json")

With rjson : rjson::fromJSON(file="competitions.json") 使用rjsonrjson::fromJSON(file="competitions.json")

  1. To run all the files at once, move all .json files to a single directory and use lapply/assign to assign your objects to your environment. 要一次运行所有文件,请将所有.json文件移动到单个目录,然后使用lapply/assign将对象分配给您的环境。

Result(single file): 结果(单个文件):

  competition_id season_id             country_name
1             37         4                  England
2             43         3            International
3             49         3 United States of America
4             72        30            International
         competition_name season_name              match_updated
1 FA Women's Super League   2018/2019    2019-06-05T22:43:14.514
2          FIFA World Cup        2018 2019-05-14T08:23:15.306297
3                    NWSL        2018 2019-05-17T00:35:34.979298
4       Women's World Cup        2019 2019-06-21T16:45:45.211614
             match_available
1    2019-06-05T22:43:14.514
2 2019-05-14T08:23:15.306297
3 2019-05-14T08:02:00.567719
4 2019-06-21T16:45:45.211614

The function fromJSON takes a JSON string as a first argument unless you specify you are giving a file ( fromJSON(file = "competitions.json") ). 除非您指定要提供文件( fromJSON(file = "competitions.json") ),否则函数fromJSON将JSON字符串作为第一个参数。

The error you mention comes from the function trying to parse 'statsbomb/data/competitions' as a string and not a file name. 您提到的错误来自试图将'statsbomb/data/competitions'解析为字符串而不是文件名的函数。 In JSON however, everything is enclosed in brackets and strings are inside quotation marks. 但是,在JSON中,所有内容都括在方括号中,字符串在引号内。 So the s from "statsbomb" is not a valid first character. 因此, "statsbomb"s不是有效的第一个字符。

To read all json files you could do: 要读取所有json文件,您可以执行以下操作:

lapply(dir("open-data-master/",pattern="*.json",recursive = T), function(x) {
  assign(gsub("/","_",x), fromJSON(file = paste0("open-data-master/",x)), envir = .GlobalEnv)
})

however this will take a long time to complete! 但是,这将需要很长时间才能完成! You probably should elaborate a little bit on this function. 您可能应该详细说明此功能。 Eg split the list of files obtained with dir into chunks of 50 before running the lapply call. 例如,在运行lapply调用之前,将通过dir获得的文件列表分成50个块。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM