[英]dplyr mutate case when
I have the following data: 我有以下数据:
library(reshape2)
library(dplyr)
d <- tibble(
Region = c("R1", "R2", "R3", "R4", "R5", "R1","R2","R3", "R4", "R5"),
Area = c("R123","R234", "R345", "R456", "R567", "R123","R234", "R345",
"R456", "R567"),
var1= c(22, 34, 34, 23, 23, 45, 56, 45, 56, 45),
var2= c(76, 34, 56, 76,23, 34, 23, 43, 23, 44))
I would like to use mutate
to create a new column which is the sum of var1 and var 2 divided by 2. 我想使用
mutate
创建一个新列,该列是var1和var 2的总和除以2。
This is the code that I have to try and do that, but it's not quite doing what I want. 这是我必须尝试执行的代码,但是它并没有完全满足我的要求。
d %>%
mutate (Total = case_when (Region == "R1" & Area == "R123" ~
sum(var1 & var2)/2),
case_when (Region == "R2" & Area == "R234" ~
sum(var1 & var2)/2)) -> data
I want just one total column, also the value in total for the first row should be 49, so I'm not sure where the 5 is coming from. 我只需要一个总列,第一行的总值也应该是49,所以我不确定5的来源。
Thanks 谢谢
You can check for condition together in case_when
else return 0 where condition doesn't match. 您可以在
case_when
一起检查条件,否则在条件不匹配的情况下返回0。
library(dplyr)
d %>%
mutate(Total = case_when((Region == "R1" & Area == "R123") |
(Region == "R2" & Area == "R234") ~ (var1 + var2) / 2,
TRUE ~ 0))
# A tibble: 10 x 5
# Region Area var1 var2 Total
# <chr> <chr> <dbl> <dbl> <dbl>
# 1 R1 R123 22 76 49
# 2 R2 R234 34 34 34
# 3 R3 R345 34 56 0
# 4 R4 R456 23 76 0
# 5 R5 R567 23 23 0
# 6 R1 R123 45 34 39.5
# 7 R2 R234 56 23 39.5
# 8 R3 R345 45 43 0
# 9 R4 R456 56 23 0
#10 R5 R567 45 44 0
The same can be achieved with ifelse
in this case 在这种情况下,使用
ifelse
也可以实现相同的效果
d %>%
mutate(Total = ifelse((Region == "R1" & Area == "R123") |
(Region == "R2" & Area == "R234"), (var1 + var2) / 2, 0))
Assuming you just want to apply the arithmetic to all the rows... 假设您只想将算术应用于所有行...
If you want to keep all the columns: 如果要保留所有列:
d %>%
mutate(Total=(var1+var2)/2) -> new_d
If you just want to keep the new Total
column: 如果只想保留新的
Total
列:
d %>%
transmute(Total=(var1+var2)/2) -> new_d
By the other way, if you want to maintain the condition used as example and apply the sum to certain regions... 另一方面,如果您想保持示例所用的条件并将总和应用于某些区域...
default = 0 # define the default value for other cases
d %>%
mutate(Total=ifelse(Region=="R1" | Region=="R2", (var1+var2)/2, default)) -> new_d
or: 要么:
default = 0 # define the default value for other cases
d %>%
transmute(Total=ifelse(Region=="R1" | Region=="R2", (var1+var2)/2, default)) -> new_d
Without using any ifelse/case_when
, we can directly multiply the logical vector with the rowMeans
of 'var1', 'var2' 无需使用
ifelse/case_when
,我们可以将逻辑向量直接与rowMeans
','var2'相乘
library(tidyverse)
d %>%
mutate(Total = (str_c(Region, Area) %in% c("R1R123", "R2R234")) *
(var1 + var2)/2)
# A tibble: 10 x 5
# Region Area var1 var2 Total
# <chr> <chr> <dbl> <dbl> <dbl>
# 1 R1 R123 22 76 49
# 2 R2 R234 34 34 34
# 3 R3 R345 34 56 0
# 4 R4 R456 23 76 0
# 5 R5 R567 23 23 0
# 6 R1 R123 45 34 39.5
# 7 R2 R234 56 23 39.5
# 8 R3 R345 45 43 0
# 9 R4 R456 56 23 0
#10 R5 R567 45 44 0
Or in base R
或以
base R
为base R
d$Total <- rowMeans(d[3:4]) * (do.call(paste0, d[1:2]) %in% c("R1R123", "R2R234"))
d$Total
#[1] 49.0 34.0 0.0 0.0 0.0 39.5 39.5 0.0 0.0 0.0
Others have already answered the question of how to do what you would want, but to answer the question of where the 5 is coming from: The sum is a column sum, not a row sum, and when you combine the variables using the & symbol you are getting values of TRUE
or FALSE
(in this case TRUE
). 其他人已经回答了如何做自己想做的问题,但是回答了5的来源:总和是列总和,而不是行总和,以及使用&符号组合变量时您将获得
TRUE
或FALSE
值(在这种情况下为TRUE
)。 When the sum of the column is calculated it is 10 because TRUE
has a numeric value of 1. The 10 is then divided by 2 to get the 5. 当计算列的总和时,它为10,因为
TRUE
的数值为1。然后将10除以2得到5。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.