简体   繁体   English

R-是否可以取消嵌套包含缺失(NA)值的列表列?

[英]R - Is it possible to unnest a list-column that contains missing (NA) values?

The tibble below has a list-column property that contains some missing values: 下面的标题有一个list-column property ,其中包含一些缺少的值:

library(tidyverse)

tbl = tibble(type = c('scale', 'range', 'min', 'max'), 
         property = list(list(lttr = letters, mth = month.name), NA) %>% 
           rep(., 2))
# A tibble: 4 x 2
  type  property  
  <chr> <list>    
1 scale <list [2]>
2 range <lgl [1]> 
3 min   <list [2]>
4 max   <lgl [1]> 

I would like to unnest this column and then spread the result into a wide format with three columns - type , lttr and mth : 我想lttr此列,然后将结果散布为具有type lttrlttrmth三列的宽格式:

tbl = tibble(type = c('scale', 'range', 'min', 'max'), 
             property = list(list(lttr = letters, mth = month.name), NA) %>% 
               rep(., 2)) %>% 
  mutate(property = map_if(property, is_list, enframe)) %>% 
  unnest(property) %>%
  spread(name, value)

However, the unnest call throws the following error: 但是, unnest调用会引发以下错误:

Error: Each column must either be a list of vectors or a list of data frames [property]

I came across a similar issue on Git that asks unnest to support NULL values but makes no mention of NAs . 我在Git上遇到了类似的问题,该问题请求unnest支持NULL值,但未提及NAs There don't appear to be any arguments in the function documentation that pertain to missings either, but I could be wrong. 函数文档中似乎也没有任何与缺失有关的参数,但是我可能是错的。

The pipeline works as expected if the NAs are filtered out: 如果NAs被过滤掉,管道将按预期工作:

tbl = tibble(type = c('scale', 'range', 'min', 'max'), 
             property = list(list(lttr = letters, mth = month.name), NA) %>% 
               rep(., 2)) %>% 
  mutate(property = map_if(property, is_list, enframe)) %>% 
  filter(!is.na(property)) %>% # drop_na() and na_omit not working not sure why
  unnest(property) %>%
  spread(name, value)

tbl
# A tibble: 2 x 3
  type  lttr       mth       
  <chr> <list>     <list>    
1 min   <chr [26]> <chr [12]>
2 scale <chr [26]> <chr [12]>

How about unnest ing the tbl , group_by type and then create new columns with summarise ? unnest tblgroup_by type ,然后创建带有summarise新列如何?

library(dplyr)
library(tidyr)

tbl %>%
  unnest() %>%
  group_by(type) %>%
  summarise(lttr = property[1L], 
            mth = property[2L])

#  type  lttr       mth       
#  <chr> <list>     <list>    
#1 max   <NULL>     <NULL>    
#2 min   <chr [26]> <chr [12]>
#3 range <NULL>     <NULL>    
#4 scale <chr [26]> <chr [12]>

An option would be to convert everything into tibble so that while unnest ing the structure would be the same across rather than manually subsetting 一个选择是将一切转换成tibble所以,虽然unnest荷兰国际集团的结构是相同的跨越,而不是手动子集化

library(tidyverse)
tbl %>%
    mutate(property = map(property, ~ if(!is.list(.x))
        enframe(list(nm1 = .x)) else enframe(.x))) %>%
    unnest %>% 
    spread(name, value) %>%
    select(type, lttr, mth)
# A tibble: 4 x 3
#  type  lttr       mth       
#  <chr> <list>     <list>    
#1 max   <NULL>     <NULL>    
#2 min   <chr [26]> <chr [12]>
#3 range <NULL>     <NULL>    
#4 scale <chr [26]> <chr [12]>

The issue in the OP's example is that difference in structure for the NA rows when compared to the other rows. OP的示例中的问题是NA行与其他行相比在结构上存在差异。 When we filter them out, the structure is same across and the issue got resolved 当我们filter它们时,结构是相同的,问题得到了解决


We can also check with another example where the number of list elements are greater than 2. 我们还可以用另一个示例检查list元素的数量是否大于2。

tbl1 <- tibble(type = c('scale', 'range', 'min', 'max'), 
      property = list(list(lttr = letters, mth = month.name, 
       val1 = rnorm(12), val2 = runif(12)), NA) %>% 
        rep(., 2))

tbl1 %>% 
   mutate(property = map(property, ~ if(!is.list(.x)) enframe(list(nm1 = .x)) 
          else enframe(.x))) %>% 
   unnest %>%
   spread(name, value) %>%
   select(-nm1)
# A tibble: 4 x 5
#  type  lttr       mth        val1       val2      
#  <chr> <list>     <list>     <list>     <list>    
#1 max   <NULL>     <NULL>     <NULL>     <NULL>    
#2 min   <chr [26]> <chr [12]> <dbl [12]> <dbl [12]>
#3 range <NULL>     <NULL>     <NULL>     <NULL>    
#4 scale <chr [26]> <chr [12]> <dbl [12]> <dbl [12]>

This can be extended to arbitrary number of elements 可以扩展到任意数量的元素

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM