I have some data in JSON I am trying to use in R. My problem is I cannot get the data in the right format.
require(RJSONIO)
json <- "[{\"ID\":\"id1\",\"VALUE\":\"15\"},{\"ID\":\"id2\",\"VALUE\":\"10\"}]"
example <- fromJSON(json)
example <- do.call(rbind,example)
example <- as.data.frame(example,stringsAsFactors=FALSE)
> example
ID VALUE
1 id1 15
2 id2 10
This gets close, but I cannot get the numeric column to convert to numeric. I know I can convert columns manually, but I thought data.frame
or as.data.frame
scanned the data and made the most appropriate class definitions. Clearly I misunderstood. I am reading in numerous tables - all very different - and I need to have the numeric data treated as such when it's numeric.
Ultimately I am looking to get data tables with numeric columns when the data is numeric.
read.table
uses type.convert
to convert data to the appropriate type. You could do the same as a cleaning step after reading in the JSON data.
sapply(example,class)
# ID VALUE
# "character" "character"
example[] <- lapply(example, type.convert, as.is = TRUE)
sapply(example, class)
# ID VALUE
# "character" "integer"
I would recommend that you use the jsonlite
package, which would convert this to a data frame by default
jsonlite::fromJSON(json)
ID VALUE
1 id1 15
2 id2 10
NOTE: The numeric
problem still remains since json
does not have data types encoded. So you will have to manually convert numeric columns.
Just to follow-up to Ramnath's suggestion to transition to jsonlite
I did some benchmarking of the two approaches:
##RJSONIO vs. jsonlite for a simple example
require(RJSONIO)
require(jsonlite)
require(microbenchmark)
json <- "{\"ID\":\"id1\",\"VALUE\":\"15\"},{\"ID\":\"id2\",\"VALUE\":\"10\"}"
test <- rep(json,1000)
test <- paste(test,collapse=",")
test <- paste0("[",test,"]")
func1 <- function(x){
temp <- jsonlite::fromJSON(x)
}
func2 <- function(x){
temp <- RJSONIO::fromJSON(x)
temp <- do.call(rbind,temp)
temp <- as.data.frame(temp,stringsAsFactors=FALSE)
}
> microbenchmark(func1(test),func2(test))
Unit: milliseconds
expr min lq median uq max neval
func1(test) 204.05228 221.46047 233.93321 246.90815 341.95684 100
func2(test) 21.60289 22.36368 22.70935 23.75409 27.41851 100
At least for now, and I know the jsonlite
package is still new and focusing on accuracy over performance, the older RJSONIO is performing faster for this simple example - even with transforming the list into a data frame.
Update including rjson
:
require(rjson)
func3 <- function(x){
temp <- rjson::fromJSON(x)
temp <- do.call(rbind,lapply(temp,unlist))
temp <- as.data.frame(temp,stringsAsFactors=FALSE)
}
> microbenchmark(func1(test),func2(test),func3(test))
Unit: milliseconds
expr min lq median uq max neval
func1(test) 205.34603 220.85428 234.79492 249.87628 323.96853 100
func2(test) 21.76972 22.67311 23.11287 23.56642 32.97469 100
func3(test) 14.16942 15.96937 17.29122 20.19562 35.63004 100
> microbenchmark(func1(test),func2(test),func3(test),times=500)
Unit: milliseconds
expr min lq median uq max neval
func1(test) 206.48986 225.70693 241.16301 253.83269 336.88535 500
func2(test) 21.75367 22.53256 23.06782 23.93026 103.70623 500
func3(test) 14.21577 15.61421 16.86046 19.27347 95.13606 500
> identical(func1(test),func2(test)) & identical(func1(test),func3(test))
[1] TRUE
At least on my machine rjson
is only slightly faster, although I did not test how it scales compared to RJSONIO
which may be where it gets the big performance bump Ramnath suggested.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.