简体   繁体   中英

parse hierarchical json into table in R

I have a lot of json arrays that don't follow the "attribute":"value" format I'm used to. I want to read them in one by one and parse them into tables. Then I want to combine the tables. I got stuck with the parsing bit.

All arrays are tagged posts from a forum and have this structure:

myjson = '
[{
    "posts": [
        [9999991, "Here is some text."],
        [9999992, "Here is some other, unrelated text."]
        ],
    "id": "123456",
    "label": "whatever"
}]
'

Where one array has one "posts", one "id", and one "label" and nothing else, but the number of []-s under "posts" is arbitrary (here, it's 2).

When I parse this into R using jsonlite , I get a jumbled mess. When I use RJSONIO or rjson , I get lists of lists of lists.

I can arrive at the desired output by piecing together the information from the lists of lists but it's horrible and error-prone:


myj = rjson::fromJSON(myjson)

post_id = c(
  myj[[1]]$posts[[1]][[1]],
  myj[[1]]$posts[[2]][[1]]
  )

post_content = c(
  myj[[1]]$posts[[1]][[2]],
  myj[[1]]$posts[[2]][[2]]
  )

dplyr::tibble(
  id = myj[[1]]$id,
  label = myj[[1]]$label,
  post_id = post_id,
  post_content = post_content
)

> # A tibble: 2 x 4
>   id      label    post_id post_content                       
>   <chr>   <chr>       <dbl> <chr>                              
> 1 123456 whatever  9999991 Here is some text.                 
> 2 123456 whatever  9999992 Here is some other, unrelated text.

This doesn't lend itself to iteration (I dunno how to refer to myj[[1]]$posts[[1...i]][[1...ii]] ) and is probably very slow.

There's gotta be a better way!

Try reading the data using jsonlite::fromJSON and unnest the values.

library(dplyr)
jsonlite::fromJSON(myjson) -> tmp

tmp %>%
  mutate(posts = purrr::map(posts, data.frame)) %>%
  tidyr::unnest(posts)

#   X1      X2                                  id     label   
#  <chr>   <chr>                               <chr>  <chr>   
#1 9999991 Here is some text.                  123456 whatever
#2 9999992 Here is some other, unrelated text. 123456 whatever

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM