使用 case_when，如何改变嵌套向量的新列表列？

Question

I'm trying to use dplyr 's case_when() to mutate a new column based on conditions in other columns.我正在尝试使用dplyr的case_when()根据其他列中的条件来改变新列。 However, I want the new column to be nesting a vector.但是，我希望新列嵌套一个向量。

Example例子

Consider the following toy data.考虑以下玩具数据。 Based on it, I want to summarize the geographical territory of the UK.在此基础上，我想总结一下英国的地理版图。

library(tibble)

set.seed(1)
my_mat <- matrix(sample(c(TRUE, FALSE), size = 40, replace = TRUE), nrow = 10, ncol = 4) 
colnames(my_mat) <- c("England", "Wales", "Scotland", "Northern_Ireland")
my_df <- as_tibble(my_mat)

> my_df

## # A tibble: 10 x 4
##    England Wales Scotland Northern_Ireland
##    <lgl>   <lgl> <lgl>    <lgl>           
##  1 TRUE    TRUE  TRUE     FALSE           
##  2 FALSE   TRUE  TRUE     FALSE           
##  3 TRUE    TRUE  TRUE     TRUE            
##  4 TRUE    TRUE  TRUE     FALSE           
##  5 FALSE   TRUE  TRUE     TRUE            
##  6 TRUE    FALSE TRUE     TRUE            
##  7 TRUE    FALSE FALSE    FALSE           
##  8 TRUE    FALSE TRUE     TRUE            
##  9 FALSE   FALSE TRUE     FALSE           
## 10 FALSE   TRUE  FALSE    FALSE

I want to mutate a new collective_geo_territory column.我想改变一个新的collective_geo_territory列。

if both England , Scotland , Wales , and Northern_Ireland are TRUE , then we say this is United_Kingdom .如果England 、 Scotland 、 Wales和Northern_Ireland都是TRUE ，那么我们说这是United_Kingdom 。
otherwise, if only England , Scotland , and Wales are TRUE , then we say this is Great_Britain否则，如果只有England 、 Scotland和Wales是TRUE ，那么我们说这是Great_Britain
any other combination would simply return a vector with the names of countries that are TRUE .任何其他组合只会返回一个带有TRUE国家名称的向量。

My attempt我的尝试

So far, I know how to address conditions (1) and (2) detailed above, using the following code到目前为止，我知道如何使用以下代码解决上面详述的条件（1）和（2）

library(dplyr)

my_df %>%
  mutate(collective_geo_territory = case_when(England == TRUE & Wales == TRUE & Scotland == TRUE & Northern_Ireland == TRUE ~ "United_Kingdom",
                                              England == TRUE & Wales == TRUE & Scotland == TRUE ~ "Great_Britain"))

Desired Output所需 Output

However, I want to achieve an output with collective_geo_territory column that looks like the following:但是，我想实现一个 output 的collective_geo_territory列，如下所示：

## # A tibble: 10 x 5
##      England Wales Scotland Northern_Ireland collective_geo_territory
##      <lgl>   <lgl> <lgl>    <lgl>            <list>                   
##   1  TRUE    TRUE  TRUE     FALSE            <chr [1]>   # c("Great_Britain")           
##   2  FALSE   TRUE  TRUE     FALSE            <chr [2]>   # c("Wales", "Scotland")                      
##   3  TRUE    TRUE  TRUE     TRUE             <chr [1]>   # c("United_Kingdom")        
##   4  TRUE    TRUE  TRUE     FALSE            <chr [1]>   # c("Great_Britain")
##   5  FALSE   TRUE  TRUE     TRUE             <chr [3]>   # c("Wales", "Scotland", "Northern_Ireland")
##   6  TRUE    FALSE TRUE     TRUE             <chr [3]>   # c("England", "Scotland", "Northern_Ireland")
##   7  TRUE    FALSE FALSE    FALSE            <chr [1]>   # c("England") 
##   8  TRUE    FALSE TRUE     TRUE             <chr [3]>   # c("England", "Scotland", "Northern_Ireland")
##   9  FALSE   FALSE TRUE     FALSE            <chr [1]>   # c("Scotland") 
##   10 FALSE   TRUE  FALSE    FALSE            <chr [1]>   # c("Wales")

Answer 1

Here's one approach:这是一种方法：

library(purrr) # used for pmap

my_df %>%
  mutate(collective_geo_territory = case_when(
    England & Wales & Scotland & Northern_Ireland ~ list("United_Kingdom"),
    England & Wales & Scotland ~ list("Great_Britain"),
    TRUE ~ pmap(my_df, ~names(my_df)[c(...)]))
    )

Essentially, the last line works as follows:本质上，最后一行的工作原理如下：

The left-hand side can simply be TRUE because case_when() terminates on the first relevant TRUE .左侧可以简单地为TRUE ，因为case_when()在第一个相关的TRUE处终止。 So, we will only reach this line if conditions 1 and 2 have failed.因此，只有条件 1 和 2 都失败了，我们才会到达这条线。
The right-hand side essentially says iterate over the rows of my dataset ( pmap ) and apply the follow function: get the names of the columns in my dataset ( names ) and subset them ( [] ) only to those where the values are true (contained in c() )右侧基本上说迭代我的数据集（ pmap ）的行并应用以下 function：获取我的数据集中列的名称（ names ）并将它们子集（ [] ）仅用于那些值为 true 的那些（包含在c()中）

A few additional notes:一些附加说明：

Note that I also had to wrap the right-hand slide of the first two conditions (eg "United_Kingdom" ) in a list() because case_when() requires consistent types for the resulting vector请注意，我还必须将前两个条件（例如"United_Kingdom" ）的右侧幻灯片包装在list()中，因为case_when()要求结果向量的类型一致
I changed the redundant England == TRUE (and same for other countries) simply to England .我将多余的England == TRUE （其他国家也一样）简单地更改为England 。 Since these columns already contain logical values, there's no need to recheck their values, and this makes the code a bit more readable.由于这些列已经包含逻辑值，因此无需重新检查它们的值，这使代码更具可读性。

使用 case_when，如何改变嵌套向量的新列表列？

问题描述

Example例子

My attempt我的尝试

Desired Output所需 Output

1 个解决方案

解决方案1
3 已采纳 2021-01-17 12:30:58

使用 case_when，如何改变嵌套向量的新列表列？

问题描述

Example例子

My attempt我的尝试

Desired Output所需 Output

1 个解决方案

解决方案1 3 已采纳 2021-01-17 12:30:58

解决方案1
3 已采纳 2021-01-17 12:30:58