简体   繁体   English

将目录中的多个JSON文件读入一个数据框

[英]Reading multiple JSON files in a directory into one Data Frame

library(rjson)
filenames <- list.files(pattern="*.json") # gives a character vector, with each file name represented by an entry

Now I want to import all the JSON files into R as one single dataFrame. 现在我想将所有JSON文件作为单个dataFrame导入到R中。 How do I do that? 我怎么做?

I first tried 我第一次尝试

myJSON <- lapply(filenames, function(x) fromJSON(file=x)) # should return a list in which each element is one of the JSON files

but the above code takes along time to terminate, since I have 15,000 files, and I know it won't return a single data frame. 但上面的代码需要时间来终止,因为我有15,000个文件,我知道它不会返回单个数据帧。 Is there a faster way to do this? 有更快的方法吗?

Sample JSON file: 示例JSON文件:

 {"Reviews": [{"Ratings": {"Service": "4", "Cleanliness": "5"}, "AuthorLocation": "Boston", "Title": "\u201cExcellent Hotel & Location\u201d", "Author": "gowharr32", "ReviewID": "UR126946257", "Content": "We enjoyed the Best Western Pioneer Square....", "Date": "March 29, 2012"}, {"Ratings": {"Overall": "5"},"AuthorLocation": "Chicago",....},{...},....}]}

For anyone looking for a purrr / tidyverse solution coming here: 对于任何寻找purrr / tidyverse解决方案的人来说:

library(purrr)
library(tidyverse)
library(jsonlite)

path <- "./your_path"
files <- dir(path, pattern = "*.json")

data <- files %>%
       map_df(~fromJSON(file.path(path, .), flatten = TRUE))

Go parallel via: 并行通过:

library(parallel)
cl <- makeCluster(detectCores() - 1)
json_files<-list.files(path ="your/json/path",pattern="*.json",full.names = TRUE)
json_list<-parLapply(cl,json_files,function(x) rjson::fromJSON(file=x,method = "R"))
stopCluster(cl)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM