简体   繁体   English

用NA消除dplyr中的小问题

[英]flatten a tibble in dplyr with NAs

I have the following data, 我有以下数据,

 h <- structure(list(label = list(list(structure(list(id = 431676528L, 
    url = "https://api.github.com/repos/emergenzeHack/terremotocentro/labels/per%20sviluppatori", 
    name = "per sviluppatori", color = "d4c5f9", default = FALSE), .Names = c("id", 
"url", "name", "color", "default")), structure(list(id = 442034204L, 
    url = "https://api.github.com/repos/emergenzeHack/terremotocentro/labels/sito%20principale", 
    name = "sito principale", color = "5319e7", default = FALSE), .Names = c("id", 
"url", "name", "color", "default"))), list(structure(list(id = 442051239L, 
    url = "https://api.github.com/repos/emergenzeHack/terremotocentro/labels/mappa", 
    name = "mappa", color = "0052cc", default = FALSE), .Names = c("id", 
"url", "name", "color", "default")), structure(list(id = 431676528L, 
    url = "https://api.github.com/repos/emergenzeHack/terremotocentro/labels/per%20sviluppatori", 
    name = "per sviluppatori", color = "d4c5f9", default = FALSE), .Names = c("id", 
"url", "name", "color", "default")), structure(list(id = 442034204L, 
    url = "https://api.github.com/repos/emergenzeHack/terremotocentro/labels/sito%20principale", 
    name = "sito principale", color = "5319e7", default = FALSE), .Names = c("id", 
"url", "name", "color", "default"))), list(NA_character_)), mainId = c("216226960", 
"215647494", "242390063")), .Names = c("label", "mainId"), row.names = c(NA, 
-3L), class = c("tbl_df", "tbl", "data.frame"))

I would like to flatten the values from label pairing them with mainId so that I can link each sub-element from labe l with its main ID . 我想从压平标签与配对他们的价值观mainId ,这样我可以每个子元素插入链接labe l在其主ID。 I'm trying to get a with the headers: label , url , name , color , mainId . 我试图让与标题: labelurlnamecolormainId

The solutions for a similar question work fine unless there are NA s nested in the sub-element of label 除非label的子元素中嵌套有NA否则类似问题的解决方案都可以正常工作

map_df(h, flatten_dfr)

Error in bind_rows_(x, .id) : Argument 1 must have names bind_rows_(x,.id)中的错误:参数1必须具有名称

You could first filter out the mainId s with missing label and then add them back in with a full_join (or simply bind_rows if your mainId s are unique). 您可以先过滤掉缺少labelmainId ,然后再使用full_join将它们添加回去(如果mainId是唯一的,也可以简单地使用bind_rows )。

library(tidyverse)

h_label_missing <- h %>% 
  filter(map_lgl(label, ~all(is.na(.)))) %>% 
  select(-label)

h %>% 
  filter(!map_lgl(label, ~all(is.na(.)))) %>% 
  mutate(label = map(label, bind_rows)) %>% 
  unnest() %>% 
  full_join(h_label_missing, by = "mainId")

# A tibble: 6 x 6
#     mainId         id url                                                                                  name             color  default
#       <chr>     <int> <chr>                                                                                <chr>            <chr>  <lgl>  
# 1 216226960 431676528 https://api.github.com/repos/emergenzeHack/terremotocentro/labels/per%20sviluppatori per sviluppatori d4c5f9 F      
# 2 216226960 442034204 https://api.github.com/repos/emergenzeHack/terremotocentro/labels/sito%20principale  sito principale  5319e7 F      
# 3 215647494 442051239 https://api.github.com/repos/emergenzeHack/terremotocentro/labels/mappa              mappa            0052cc F      
# 4 215647494 431676528 https://api.github.com/repos/emergenzeHack/terremotocentro/labels/per%20sviluppatori per sviluppatori d4c5f9 F      
# 5 215647494 442034204 https://api.github.com/repos/emergenzeHack/terremotocentro/labels/sito%20principale  sito principale  5319e7 F      
# 6 242390063        NA NA                                                                                   NA               NA     NA     

Here's an approach that replaces the element just containing NA_character_ with an list of NA s named like the first element of the first row. 这是一种将包含NA_character_的元素替换为NA列表的方法,该列表的名称类似于第一行的第一元素。 After that point, bind_rows and unnest will work normally. 此后, bind_rowsunnest将正常工作。

library(tidyverse)

nested_names <- names(pluck(h, 'label', 1, 1))

h2 <- h %>% 
    mutate(label = map(label, map_if, 
                       ~is.null(names(.x)), 
                       ~setNames(rep(list(NA), length(nested_names)), 
                                 nested_names)), 
           label = map(label, bind_rows)) %>% 
    unnest()

h2
#> # A tibble: 6 x 6
#>   mainId           id url                           name     color default
#>   <chr>         <int> <chr>                         <chr>    <chr> <lgl>  
#> 1 216226960 431676528 https://api.github.com/repos… per svi… d4c5… FALSE  
#> 2 216226960 442034204 https://api.github.com/repos… sito pr… 5319… FALSE  
#> 3 215647494 442051239 https://api.github.com/repos… mappa    0052… FALSE  
#> 4 215647494 431676528 https://api.github.com/repos… per svi… d4c5… FALSE  
#> 5 215647494 442034204 https://api.github.com/repos… sito pr… 5319… FALSE  
#> 6 242390063        NA <NA>                          <NA>     <NA>  NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM