簡體   English   中英

在 R 中取消嵌套 json

[英]Unnest json in R

我知道這是一個熱門話題,但我無法找到一個解決方案來滿足我的需求。

JSON 數據:

{
    "team": [{
        "Team A": [
            {"episode":"shots","result":"accurate", "data":[{"x":"100","y":"40","dx":"97.9","dy":"36.3"},{"x":"105","y":"68","dx":"95.8","dy":"25.9"}]},
            {"episode":"shots","result":"inaccurate", "data":[{"x":"95","y":"33","dx":"93.9","dy":"46.3"},{"x":"93","y":"68","dx":"95.8","dy":"25.9"}]}
        ],
        "Team B": [
            {"episode":"shots","result":"accurate", "data":[{"x":"100","y":"40","dx":"97.9","dy":"36.3"},{"x":"105","y":"68","dx":"95.8","dy":"25.9"}]},
            {"episode":"shots","result":"inaccurate", "data":[{"x":"95","y":"33","dx":"97.9","dy":"36.3"},{"x":"105","y":"68","dx":"95.8","dy":"26.3"}]}
        ]
    }]
}

最終輸出(我已經截斷,但希望這足夠直接):

tibble(team = c("Team A", "Team A", "Team A", "Team A"), 
       episode = c("shots","shots","shots","shots"), 
       result = c("accurate", "accurate", "inaccurate", "inaccurate"), 
       x= c(100,105,95,93), 
       etc = c("...","...","...","..."))

謝謝!

您可以使用bind_rows (將數據框列表合並為一個)和 tidyr's unnest (將data列取消嵌套為多行)的組合來執行此操作。

library(dplyr)
library(tidyr)
library(purrr)

j$team %>%
  map(1) %>%
  bind_rows(.id = "team") %>%
  unnest(cols = data)

(可能看起來令人驚訝的部分是map(1) 。這是必要的,因為解析數據幀中 JSON 對象中的每個對象本身就是一個列表; map(1)獲取每個對象中的第一項)。

結果:

# A tibble: 8 x 7
  team   episode result     x     y     dx    dy   
  <chr>  <chr>   <chr>      <chr> <chr> <chr> <chr>
1 Team A shots   accurate   100   40    97.9  36.3 
2 Team A shots   accurate   105   68    95.8  25.9 
3 Team A shots   inaccurate 95    33    93.9  46.3 
4 Team A shots   inaccurate 93    68    95.8  25.9 
5 Team B shots   accurate   100   40    97.9  36.3 
6 Team B shots   accurate   105   68    95.8  25.9 
7 Team B shots   inaccurate 95    33    97.9  36.3 
8 Team B shots   inaccurate 105   68    95.8  26.3 

j對象的設置:

library(jsonlite)

j <- jsonlite::fromJSON('{
    "team": [{
        "Team A": [
            {"episode":"shots","result":"accurate", "data":[{"x":"100","y":"40","dx":"97.9","dy":"36.3"},{"x":"105","y":"68","dx":"95.8","dy":"25.9"}]},
            {"episode":"shots","result":"inaccurate", "data":[{"x":"95","y":"33","dx":"93.9","dy":"46.3"},{"x":"93","y":"68","dx":"95.8","dy":"25.9"}]}
        ],
        "Team B": [
            {"episode":"shots","result":"accurate", "data":[{"x":"100","y":"40","dx":"97.9","dy":"36.3"},{"x":"105","y":"68","dx":"95.8","dy":"25.9"}]},
            {"episode":"shots","result":"inaccurate", "data":[{"x":"95","y":"33","dx":"97.9","dy":"36.3"},{"x":"105","y":"68","dx":"95.8","dy":"26.3"}]}
        ]
    }]
}
')

(請注意,如果您在解析 JSON 對象時提供了simplifyDataFrame = TRUE ,則代碼會有所不同;因此您可能需要不同的方法)。

我假設您可以將 json 讀入單個字符串並使用jsonlite::fromJSON解析它

對於這個例子,我從你的問題中選擇並復制了 json 並做了:

x <- jsonlite::fromJSON(paste(readClipboard(), collapse = "\n"))

然后,您可以使用 tidyverse 函數提取數據框,如下所示:

library(dplyr)
library(tidyr)

do.call(rbind, lapply(x$team, function(y) y[[1]])) %>%
  rownames_to_column("team") %>%
  unnest(cols = data) %>%
  mutate(team = gsub("\\..*$", "", team))

#> # A tibble: 8 x 7
#>   team   episode result     x     y     dx    dy   
#>   <chr>  <chr>   <chr>      <chr> <chr> <chr> <chr>
#> 1 Team A shots   accurate   100   40    97.9  36.3 
#> 2 Team A shots   accurate   105   68    95.8  25.9 
#> 3 Team A shots   inaccurate 95    33    93.9  46.3 
#> 4 Team A shots   inaccurate 93    68    95.8  25.9 
#> 5 Team B shots   accurate   100   40    97.9  36.3 
#> 6 Team B shots   accurate   105   68    95.8  25.9 
#> 7 Team B shots   inaccurate 95    33    97.9  36.3 
#> 8 Team B shots   inaccurate 105   68    95.8  26.3 

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM