[英]Unnest json in R
我知道這是一個熱門話題,但我無法找到一個解決方案來滿足我的需求。
JSON 數據:
{
"team": [{
"Team A": [
{"episode":"shots","result":"accurate", "data":[{"x":"100","y":"40","dx":"97.9","dy":"36.3"},{"x":"105","y":"68","dx":"95.8","dy":"25.9"}]},
{"episode":"shots","result":"inaccurate", "data":[{"x":"95","y":"33","dx":"93.9","dy":"46.3"},{"x":"93","y":"68","dx":"95.8","dy":"25.9"}]}
],
"Team B": [
{"episode":"shots","result":"accurate", "data":[{"x":"100","y":"40","dx":"97.9","dy":"36.3"},{"x":"105","y":"68","dx":"95.8","dy":"25.9"}]},
{"episode":"shots","result":"inaccurate", "data":[{"x":"95","y":"33","dx":"97.9","dy":"36.3"},{"x":"105","y":"68","dx":"95.8","dy":"26.3"}]}
]
}]
}
最終輸出(我已經截斷,但希望這足夠直接):
tibble(team = c("Team A", "Team A", "Team A", "Team A"),
episode = c("shots","shots","shots","shots"),
result = c("accurate", "accurate", "inaccurate", "inaccurate"),
x= c(100,105,95,93),
etc = c("...","...","...","..."))
謝謝!
您可以使用bind_rows
(將數據框列表合並為一個)和 tidyr's unnest
(將data
列取消嵌套為多行)的組合來執行此操作。
library(dplyr)
library(tidyr)
library(purrr)
j$team %>%
map(1) %>%
bind_rows(.id = "team") %>%
unnest(cols = data)
(可能看起來令人驚訝的部分是map(1)
。這是必要的,因為解析數據幀中 JSON 對象中的每個對象本身就是一個列表; map(1)
獲取每個對象中的第一項)。
結果:
# A tibble: 8 x 7
team episode result x y dx dy
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Team A shots accurate 100 40 97.9 36.3
2 Team A shots accurate 105 68 95.8 25.9
3 Team A shots inaccurate 95 33 93.9 46.3
4 Team A shots inaccurate 93 68 95.8 25.9
5 Team B shots accurate 100 40 97.9 36.3
6 Team B shots accurate 105 68 95.8 25.9
7 Team B shots inaccurate 95 33 97.9 36.3
8 Team B shots inaccurate 105 68 95.8 26.3
j
對象的設置:
library(jsonlite)
j <- jsonlite::fromJSON('{
"team": [{
"Team A": [
{"episode":"shots","result":"accurate", "data":[{"x":"100","y":"40","dx":"97.9","dy":"36.3"},{"x":"105","y":"68","dx":"95.8","dy":"25.9"}]},
{"episode":"shots","result":"inaccurate", "data":[{"x":"95","y":"33","dx":"93.9","dy":"46.3"},{"x":"93","y":"68","dx":"95.8","dy":"25.9"}]}
],
"Team B": [
{"episode":"shots","result":"accurate", "data":[{"x":"100","y":"40","dx":"97.9","dy":"36.3"},{"x":"105","y":"68","dx":"95.8","dy":"25.9"}]},
{"episode":"shots","result":"inaccurate", "data":[{"x":"95","y":"33","dx":"97.9","dy":"36.3"},{"x":"105","y":"68","dx":"95.8","dy":"26.3"}]}
]
}]
}
')
(請注意,如果您在解析 JSON 對象時提供了simplifyDataFrame = TRUE
,則代碼會有所不同;因此您可能需要不同的方法)。
我假設您可以將 json 讀入單個字符串並使用jsonlite::fromJSON
解析它
對於這個例子,我從你的問題中選擇並復制了 json 並做了:
x <- jsonlite::fromJSON(paste(readClipboard(), collapse = "\n"))
然后,您可以使用 tidyverse 函數提取數據框,如下所示:
library(dplyr)
library(tidyr)
do.call(rbind, lapply(x$team, function(y) y[[1]])) %>%
rownames_to_column("team") %>%
unnest(cols = data) %>%
mutate(team = gsub("\\..*$", "", team))
#> # A tibble: 8 x 7
#> team episode result x y dx dy
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Team A shots accurate 100 40 97.9 36.3
#> 2 Team A shots accurate 105 68 95.8 25.9
#> 3 Team A shots inaccurate 95 33 93.9 46.3
#> 4 Team A shots inaccurate 93 68 95.8 25.9
#> 5 Team B shots accurate 100 40 97.9 36.3
#> 6 Team B shots accurate 105 68 95.8 25.9
#> 7 Team B shots inaccurate 95 33 97.9 36.3
#> 8 Team B shots inaccurate 105 68 95.8 26.3
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.