简体   繁体   English

dplyr mutate案件何时

[英]dplyr mutate case when

I have the following data: 我有以下数据:

library(reshape2)
library(dplyr)

d <- tibble(
  Region = c("R1", "R2", "R3", "R4", "R5", "R1","R2","R3", "R4", "R5"),
  Area = c("R123","R234", "R345", "R456", "R567", "R123","R234", "R345", 
"R456", "R567"),
  var1= c(22, 34, 34, 23, 23, 45, 56, 45, 56, 45),
  var2= c(76, 34, 56, 76,23, 34, 23, 43, 23, 44))

I would like to use mutate to create a new column which is the sum of var1 and var 2 divided by 2. 我想使用mutate创建一个新列,该列是var1和var 2的总和除以2。

This is the code that I have to try and do that, but it's not quite doing what I want. 这是我必须尝试执行的代码,但是它并没有完全满足我的要求。

d %>% 
  mutate (Total = case_when (Region == "R1" & Area == "R123" ~
                              sum(var1 & var2)/2),
      case_when (Region == "R2" & Area == "R234" ~
                              sum(var1 & var2)/2)) -> data

I want just one total column, also the value in total for the first row should be 49, so I'm not sure where the 5 is coming from. 我只需要一个总列,第一行的总值也应该是49,所以我不确定5的来源。

Thanks 谢谢

You can check for condition together in case_when else return 0 where condition doesn't match. 您可以在case_when一起检查条件,否则在条件不匹配的情况下返回0。

library(dplyr)

d %>% 
  mutate(Total = case_when((Region == "R1" & Area == "R123") |
                            (Region == "R2" & Area == "R234") ~ (var1 + var2) / 2, 
                            TRUE ~ 0))  

# A tibble: 10 x 5
#  Region Area   var1  var2 Total
#   <chr>  <chr> <dbl> <dbl> <dbl>
# 1 R1     R123     22    76  49  
# 2 R2     R234     34    34  34  
# 3 R3     R345     34    56   0  
# 4 R4     R456     23    76   0  
# 5 R5     R567     23    23   0  
# 6 R1     R123     45    34  39.5
# 7 R2     R234     56    23  39.5
# 8 R3     R345     45    43   0  
# 9 R4     R456     56    23   0  
#10 R5     R567     45    44   0  

The same can be achieved with ifelse in this case 在这种情况下,使用ifelse也可以实现相同的效果

d %>% 
  mutate(Total = ifelse((Region == "R1" & Area == "R123") | 
         (Region == "R2" & Area == "R234"), (var1 + var2) / 2,  0))  

Assuming you just want to apply the arithmetic to all the rows... 假设您只想将算术应用于所有行...

If you want to keep all the columns: 如果要保留所有列:

d %>% 
  mutate(Total=(var1+var2)/2) -> new_d

If you just want to keep the new Total column: 如果只想保留新的Total列:

d %>% 
  transmute(Total=(var1+var2)/2) -> new_d


By the other way, if you want to maintain the condition used as example and apply the sum to certain regions... 另一方面,如果您想保持示例所用的条件并将总和应用于某些区域...

default = 0       # define the default value for other cases

d %>% 
  mutate(Total=ifelse(Region=="R1" | Region=="R2", (var1+var2)/2, default)) -> new_d

or: 要么:

default = 0       # define the default value for other cases

d %>% 
  transmute(Total=ifelse(Region=="R1" | Region=="R2", (var1+var2)/2, default)) -> new_d

Without using any ifelse/case_when , we can directly multiply the logical vector with the rowMeans of 'var1', 'var2' 无需使用ifelse/case_when ,我们可以将逻辑向量直接与rowMeans ','var2'相乘

library(tidyverse)
d %>%
    mutate(Total = (str_c(Region, Area) %in% c("R1R123", "R2R234")) * 
             (var1 + var2)/2)
# A tibble: 10 x 5
#   Region Area   var1  var2 Total
#   <chr>  <chr> <dbl> <dbl> <dbl>
# 1 R1     R123     22    76  49  
# 2 R2     R234     34    34  34  
# 3 R3     R345     34    56   0  
# 4 R4     R456     23    76   0  
# 5 R5     R567     23    23   0  
# 6 R1     R123     45    34  39.5
# 7 R2     R234     56    23  39.5
# 8 R3     R345     45    43   0  
# 9 R4     R456     56    23   0  
#10 R5     R567     45    44   0  

Or in base R 或以base Rbase R

d$Total <- rowMeans(d[3:4]) * (do.call(paste0, d[1:2]) %in% c("R1R123", "R2R234"))
d$Total
#[1] 49.0 34.0  0.0  0.0  0.0 39.5 39.5  0.0  0.0  0.0

Others have already answered the question of how to do what you would want, but to answer the question of where the 5 is coming from: The sum is a column sum, not a row sum, and when you combine the variables using the & symbol you are getting values of TRUE or FALSE (in this case TRUE ). 其他人已经回答了如何做自己想做的问题,但是回答了5的来源:总和是列总和,而不是行总和,以及使用&符号组合变量时您将获得TRUEFALSE值(在这种情况下为TRUE )。 When the sum of the column is calculated it is 10 because TRUE has a numeric value of 1. The 10 is then divided by 2 to get the 5. 当计算列的总和时,它为10,因为TRUE的数值为1。然后将10除以2得到5。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM