简体   繁体   中英

Flattening nested JSON in R

Hi folks: I've searched stack overflow and the rest of the internet for an answer to this question, but none of the answers I can find seem to work for me.

I've got thousands of rows of json data with information about images from a camera trap study. I'm having lots of trouble unpacking the data. I'm using jsonlite::fromJSON to no avail. Same for as.tbl_json from tidyjson.

My goal is to write some code that will give me a data frame with a column for each variable stored in the json format. Can you help?

Here's a vector of data that I'm playing with, though I actually have the data as a single column in a larger .csv file . First row is the column name.

annotations<-c(annotations,
"[{""task"":""T0"",""value"":[{""choice"":""NOTHINGHERE"",""answers"":{},""filters"":{}}]}]"
"[{""task"":""T0"",""value"":[{""choice"":""NOTHINGHERE"",""answers"":{},""filters"":{}}]}]"
"[{""task"":""T0"",""value"":[{""choice"":""DEERWHITETAILED"",""answers"":{""HOWMANY"":""1"",""YOUNGPRESENT"":""NO"",""ANTLERSPRESENT"":""NO"",""WHATBEHAVIORSDOYOUSEE"":[""ALERT""],""ESTIMATEOFSNOWDEPTHSEETUTORIAL"":""NOSNOWBAREGROUND"",""ISITACTIVELYRAININGORSNOWINGINTHEPICTURE"":""NO""},""filters"":{}}]}]"
"[{""task"":""T0"",""value"":[{""choice"":""NOTHINGHERE"",""answers"":{},""filters"":{}}]}]"
"[{""task"":""T0"",""value"":[{""choice"":""NOTHINGHERE"",""answers"":{},""filters"":{}}]}]"
"[{""task"":""T0"",""value"":[{""choice"":""NOTHINGHERE"",""answers"":{},""filters"":{}}]}]"
"[{""task"":""T0"",""value"":[{""choice"":""NOTHINGHERE"",""answers"":{},""filters"":{}}]}]"
"[{""task"":""T0"",""value"":[{""choice"":""NOTHINGHERE"",""answers"":{},""filters"":{}}]}]"
"[{""task"":""T0"",""value"":[{""choice"":""FISHER"",""answers"":{""HOWMANY"":""1"",""YOUNGPRESENT"":""NO"",""WHATBEHAVIORSDOYOUSEE"":[""WALKINGRUNNING"",""ALERT""],""ESTIMATEOFSNOWDEPTHSEETUTORIAL"":""1020CM"",""ISITACTIVELYRAININGORSNOWINGINTHEPICTURE"":""NO""},""filters"":{}}]}]"
"[{""task"":""T0"",""value"":[{""choice"":""NOTHINGHERE"",""answers"":{},""filters"":{}}]}]")

Here's what I get if I run dput(annotations):

structure(list(annotations = c("[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"DEERWHITETAILED\",\"answers\":{\"HOWMANY\":\"1\",\"YOUNGPRESENT\":\"NO\",\"ANTLERSPRESENT\":\"NO\",\"WHATBEHAVIORSDOYOUSEE\":[\"ALERT\"],\"ESTIMATEOFSNOWDEPTHSEETUTORIAL\":\"NOSNOWBAREGROUND\",\"ISITACTIVELYRAININGORSNOWINGINTHEPICTURE\":\"NO\"},\"filters\":{}}]}]", 
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"FISHER\",\"answers\":{\"HOWMANY\":\"1\",\"YOUNGPRESENT\":\"NO\",\"WHATBEHAVIORSDOYOUSEE\":[\"WALKINGRUNNING\",\"ALERT\"],\"ESTIMATEOFSNOWDEPTHSEETUTORIAL\":\"1020CM\",\"ISITACTIVELYRAININGORSNOWINGINTHEPICTURE\":\"NO\"},\"filters\":{}}]}]", 
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]"
)), class = "data.frame", row.names = c(NA, -10L))

It was not completely clear to me exactly what output format you are looking for. There are lots of different ways you can do this. Further, the arrays in your data structure (which only have one object in them, each) complicate matters a bit because they could have more objects in them.

In any case, tidyjson doesn't require too much code thanks to spread_all() . You could also spread only particular values with spread_values() , or enter_object(answers) to just spread answers, etc. Hope it helps!

library(tidyjson)
#> 
#> Attaching package: 'tidyjson'
#> The following object is masked from 'package:stats':
#> 
#>     filter
library(tibble)

annotations <- structure(list(annotations = c("[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
                                              "[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
                                              "[{\"task\":\"T0\",\"value\":[{\"choice\":\"DEERWHITETAILED\",\"answers\":{\"HOWMANY\":\"1\",\"YOUNGPRESENT\":\"NO\",\"ANTLERSPRESENT\":\"NO\",\"WHATBEHAVIORSDOYOUSEE\":[\"ALERT\"],\"ESTIMATEOFSNOWDEPTHSEETUTORIAL\":\"NOSNOWBAREGROUND\",\"ISITACTIVELYRAININGORSNOWINGINTHEPICTURE\":\"NO\"},\"filters\":{}}]}]", 
                                              "[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
                                              "[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
                                              "[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
                                              "[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
                                              "[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
                                              "[{\"task\":\"T0\",\"value\":[{\"choice\":\"FISHER\",\"answers\":{\"HOWMANY\":\"1\",\"YOUNGPRESENT\":\"NO\",\"WHATBEHAVIORSDOYOUSEE\":[\"WALKINGRUNNING\",\"ALERT\"],\"ESTIMATEOFSNOWDEPTHSEETUTORIAL\":\"1020CM\",\"ISITACTIVELYRAININGORSNOWINGINTHEPICTURE\":\"NO\"},\"filters\":{}}]}]", 
                                              "[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]"
)), class = "data.frame", row.names = c(NA, -10L))

ant <- tibble(raw = annotations$annotations)

as.tbl_json(ant, json.column = "raw") %>%
  gather_array("object_id") %>% 
  spread_all() %>%
  enter_object("value") %>%
  gather_array("value_id") %>%
  spread_all() %>%
  as_tibble()
#> # A tibble: 10 x 9
#>    object_id task  value_id choice answers.HOWMANY answers.YOUNGPR…
#>        <int> <chr>    <int> <chr>  <chr>           <chr>           
#>  1         1 T0           1 NOTHI… <NA>            <NA>            
#>  2         1 T0           1 NOTHI… <NA>            <NA>            
#>  3         1 T0           1 DEERW… 1               NO              
#>  4         1 T0           1 NOTHI… <NA>            <NA>            
#>  5         1 T0           1 NOTHI… <NA>            <NA>            
#>  6         1 T0           1 NOTHI… <NA>            <NA>            
#>  7         1 T0           1 NOTHI… <NA>            <NA>            
#>  8         1 T0           1 NOTHI… <NA>            <NA>            
#>  9         1 T0           1 FISHER 1               NO              
#> 10         1 T0           1 NOTHI… <NA>            <NA>            
#> # … with 3 more variables: answers.ANTLERSPRESENT <chr>,
#> #   answers.ESTIMATEOFSNOWDEPTHSEETUTORIAL <chr>,
#> #   answers.ISITACTIVELYRAININGORSNOWINGINTHEPICTURE <chr>

Created on 2020-03-14 by the reprex package (v0.3.0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM