[英]R - Is it possible to unnest a list-column that contains missing (NA) values?
The tibble below has a list-column property
that contains some missing values: 下面的标题有一个list-column property
,其中包含一些缺少的值:
library(tidyverse)
tbl = tibble(type = c('scale', 'range', 'min', 'max'),
property = list(list(lttr = letters, mth = month.name), NA) %>%
rep(., 2))
# A tibble: 4 x 2
type property
<chr> <list>
1 scale <list [2]>
2 range <lgl [1]>
3 min <list [2]>
4 max <lgl [1]>
I would like to unnest this column and then spread the result into a wide format with three columns - type
, lttr
and mth
: 我想lttr
此列,然后将结果散布为具有type
lttr
, lttr
和mth
三列的宽格式:
tbl = tibble(type = c('scale', 'range', 'min', 'max'),
property = list(list(lttr = letters, mth = month.name), NA) %>%
rep(., 2)) %>%
mutate(property = map_if(property, is_list, enframe)) %>%
unnest(property) %>%
spread(name, value)
However, the unnest
call throws the following error: 但是, unnest
调用会引发以下错误:
Error: Each column must either be a list of vectors or a list of data frames [property]
I came across a similar issue on Git that asks unnest
to support NULL
values but makes no mention of NAs
. 我在Git上遇到了类似的问题,该问题请求unnest
支持NULL
值,但未提及NAs
。 There don't appear to be any arguments in the function documentation that pertain to missings either, but I could be wrong. 函数文档中似乎也没有任何与缺失有关的参数,但是我可能是错的。
The pipeline works as expected if the NAs
are filtered out: 如果NAs
被过滤掉,管道将按预期工作:
tbl = tibble(type = c('scale', 'range', 'min', 'max'),
property = list(list(lttr = letters, mth = month.name), NA) %>%
rep(., 2)) %>%
mutate(property = map_if(property, is_list, enframe)) %>%
filter(!is.na(property)) %>% # drop_na() and na_omit not working not sure why
unnest(property) %>%
spread(name, value)
tbl
# A tibble: 2 x 3
type lttr mth
<chr> <list> <list>
1 min <chr [26]> <chr [12]>
2 scale <chr [26]> <chr [12]>
How about unnest
ing the tbl
, group_by
type
and then create new columns with summarise
? unnest
tbl
和group_by
type
,然后创建带有summarise
新列如何?
library(dplyr)
library(tidyr)
tbl %>%
unnest() %>%
group_by(type) %>%
summarise(lttr = property[1L],
mth = property[2L])
# type lttr mth
# <chr> <list> <list>
#1 max <NULL> <NULL>
#2 min <chr [26]> <chr [12]>
#3 range <NULL> <NULL>
#4 scale <chr [26]> <chr [12]>
An option would be to convert everything into tibble
so that while unnest
ing the structure would be the same across rather than manually subsetting 一个选择是将一切转换成tibble
所以,虽然unnest
荷兰国际集团的结构是相同的跨越,而不是手动子集化
library(tidyverse)
tbl %>%
mutate(property = map(property, ~ if(!is.list(.x))
enframe(list(nm1 = .x)) else enframe(.x))) %>%
unnest %>%
spread(name, value) %>%
select(type, lttr, mth)
# A tibble: 4 x 3
# type lttr mth
# <chr> <list> <list>
#1 max <NULL> <NULL>
#2 min <chr [26]> <chr [12]>
#3 range <NULL> <NULL>
#4 scale <chr [26]> <chr [12]>
The issue in the OP's example is that difference in structure for the NA
rows when compared to the other rows. OP的示例中的问题是NA
行与其他行相比在结构上存在差异。 When we filter
them out, the structure is same across and the issue got resolved 当我们filter
它们时,结构是相同的,问题得到了解决
We can also check with another example where the number of list
elements are greater than 2. 我们还可以用另一个示例检查list
元素的数量是否大于2。
tbl1 <- tibble(type = c('scale', 'range', 'min', 'max'),
property = list(list(lttr = letters, mth = month.name,
val1 = rnorm(12), val2 = runif(12)), NA) %>%
rep(., 2))
tbl1 %>%
mutate(property = map(property, ~ if(!is.list(.x)) enframe(list(nm1 = .x))
else enframe(.x))) %>%
unnest %>%
spread(name, value) %>%
select(-nm1)
# A tibble: 4 x 5
# type lttr mth val1 val2
# <chr> <list> <list> <list> <list>
#1 max <NULL> <NULL> <NULL> <NULL>
#2 min <chr [26]> <chr [12]> <dbl [12]> <dbl [12]>
#3 range <NULL> <NULL> <NULL> <NULL>
#4 scale <chr [26]> <chr [12]> <dbl [12]> <dbl [12]>
This can be extended to arbitrary number of elements 可以扩展到任意数量的元素
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.