简体   繁体   中英

How to extract values from JSON blob into new columns in R data frame?

I am stuck on the following problem:

I have a data frame that has a variable that contains JSON objects (in var2 ):

  var1                                  var2
1    1 {"property1": "val1", "property2": 5}
2    2 {"property1": "val2", "property2": 8}
3    3 {"property1": "val3", "property2": 7}
4    4 {"property1": "val4", "property2": 0}
5    5 {"property1": "val5", "property3": 9}

(Code on pastebin here )

I want to extract the JSON properties in var2 and andd them to the data frame in new columns like so:

  var1                                  var2 prop1 prop2 prop3
1    1 {"property1": "val1", "property2": 5}  val1     5    NA
2    2 {"property1": "val2", "property2": 8}  val2     8    NA
3    3 {"property1": "val3", "property2": 7}  val3     7    NA
4    4 {"property1": "val4", "property2": 0}  val4     0    NA
5    5 {"property1": "val5", "property2": 9}  val5    NA     9

Given identical properties in identical sequence, I have found this way to make it work:

jsonProps <- sapply(df$var2, function(x) fromJSON(x)) %>%
  t() %>%
  as.data.frame()
rownames(jsonProps) <- NULL

y <- cbind(df, jsonProps)

(I am happy to receive any suggestions on how to make this more efficient, if there might be any.)

This does not work anymore when

  • the number of properties differs across records and/or
  • the sequence changes and/or
  • different properties are stored between records.

I am at a loss on how to dynmically create columns from the properties I find and transfer the property values correctly and would thus welcome your suggestions on how to tackle this.

You can do:

library(plyr)
library(jsonlite)

ll = lapply(df$var2, function(x) jsonlite::fromJSON(as.character(x)))
cbind(df, ldply(ll, data.frame))

#  var1                                  var2 property1 property3 property2
#1    a {"property1": "val1", "property3": 8}      val1         8        NA
#2    a {"property1": "val1", "property2": 5}      val1        NA         5

Data:

df = structure(list(var11 = structure(c(1L, 1L), .Label = "a", class = "factor"), 
var2 = structure(1:2, .Label = c("{\"property1\": \"val1\", \"property3\": 8}", 
"{\"property1\": \"val1\", \"property2\": 5}"), class = "factor")), .Names = c("var1", 
"var2"), class = "data.frame", row.names = 1:2)

This doesn't do everything you want, but perhaps is better

library("dplyr")
library("jsonlite")

get_it <- function(x) {
  jsonlite::fromJSON(as.character(x))
}

tbl_df(test) %>%
  rowwise() %>%
  mutate(one = get_it(var2)[[1]],
         two = get_it(var2)[[2]])

Source: local data frame [5 x 4]
Groups: <by row>

   var1                                  var2   one   two
  (dbl)                                (fctr) (chr) (int)
1     1 {"property1": "val1", "property2": 5}  val1     5
2     2 {"property1": "val2", "property2": 8}  val2     8
3     3 {"property1": "val3", "property2": 7}  val3     7
4     4 {"property1": "val4", "property2": 0}  val4     0
5     5 {"property1": "val5", "property3": 9}  val5     9

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM