I have a lot of json arrays that don't follow the "attribute":"value" format I'm used to. I want to read them in one by one and parse them into tables. Then I want to combine the tables. I got stuck with the parsing bit.
All arrays are tagged posts from a forum and have this structure:
myjson = '
[{
"posts": [
[9999991, "Here is some text."],
[9999992, "Here is some other, unrelated text."]
],
"id": "123456",
"label": "whatever"
}]
'
Where one array has one "posts", one "id", and one "label" and nothing else, but the number of []-s under "posts" is arbitrary (here, it's 2).
When I parse this into R using jsonlite
, I get a jumbled mess. When I use RJSONIO
or rjson
, I get lists of lists of lists.
I can arrive at the desired output by piecing together the information from the lists of lists but it's horrible and error-prone:
myj = rjson::fromJSON(myjson)
post_id = c(
myj[[1]]$posts[[1]][[1]],
myj[[1]]$posts[[2]][[1]]
)
post_content = c(
myj[[1]]$posts[[1]][[2]],
myj[[1]]$posts[[2]][[2]]
)
dplyr::tibble(
id = myj[[1]]$id,
label = myj[[1]]$label,
post_id = post_id,
post_content = post_content
)
> # A tibble: 2 x 4
> id label post_id post_content
> <chr> <chr> <dbl> <chr>
> 1 123456 whatever 9999991 Here is some text.
> 2 123456 whatever 9999992 Here is some other, unrelated text.
This doesn't lend itself to iteration (I dunno how to refer to myj[[1]]$posts[[1...i]][[1...ii]]
) and is probably very slow.
There's gotta be a better way!
Try reading the data using jsonlite::fromJSON
and unnest
the values.
library(dplyr)
jsonlite::fromJSON(myjson) -> tmp
tmp %>%
mutate(posts = purrr::map(posts, data.frame)) %>%
tidyr::unnest(posts)
# X1 X2 id label
# <chr> <chr> <chr> <chr>
#1 9999991 Here is some text. 123456 whatever
#2 9999992 Here is some other, unrelated text. 123456 whatever
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.