[英]dplyr::mutate (assign na.rm =TRUE)
I have a data.frame that has 100 variables. 我有一个包含100个变量的data.frame。 I want to get the sum of three variables only using mutate
(not summarise
). 我想只使用mutate
(不是summarise
)得到三个变量的总和。
If there is NA in any of the 3 variables, I still want to get the sum
. 如果3个变量中的任何一个都有NA,我仍然想得到sum
。 In order to do this using mutate
, I replaced all NA
values with 0
using ifelse
then I got the sum
. 为了使用mutate
执行此操作,我使用ifelse
将所有NA
值替换为0
然后我得到了sum
。
library(dplyr)
df %>% mutate(mod_var1 = ifelse(is.na(var1), 0, var1),
mod_var2 = ifelse(is.na(var2), 0, var2),
mod_var3 = ifelse(is.na(var3), 0, var3),
sum = (mod_var1+mod_var2+mod_var3))
Is there any better (shorter) way to do this? 有没有更好(更短)的方法来做到这一点?
DATA 数据
df <- read.table(text = c("
var1 var2 var3
4 5 NA
2 NA 3
1 2 4
NA 3 5
3 NA 2
1 1 5"), header =T)
rowwise()
is my go-to function. rowwise()
是我的rowwise()
功能。 It's like group_by()
but it treats each row as an individual group. 它就像group_by()
但它将每一行视为一个单独的组。
df %>% rowwise() %>% mutate(Sum = sum(c(var1, var2, var3), na.rm = TRUE))
We can use Reduce
with +
我们可以使用Reduce
+
df %>%
mutate_each(funs(replace(., is.na(.), 0)), var1:var3) %>%
mutate(Sum = Reduce(`+`, .))
# var1 var2 var3 Sum
#1 4 5 0 9
#2 2 0 3 5
#3 1 2 4 7
#4 0 3 5 8
#5 3 0 2 5
#6 1 1 5 7
Or with rowSums
或者使用rowSums
df %>%
mutate(Sum = rowSums(.[names(.)[1:3]], na.rm = TRUE))
# var1 var2 var3 Sum
#1 4 5 NA 9
#2 2 NA 3 5
#3 1 2 4 7
#4 NA 3 5 8
#5 3 NA 2 5
#6 1 1 5 7
set.seed(24)
df1 <- as.data.frame(matrix(sample(c(NA, 1:5), 1e6 *3, replace=TRUE),
dimnames = list(NULL, paste0("var", 1:3)), ncol=3))
system.time({
df1 %>% rowwise() %>% mutate(Sum = sum(c(var1, var2, var3), na.rm = TRUE))
})
# user system elapsed
# 21.50 0.03 21.66
system.time({
df1 %>%
mutate(rn = row_number()) %>%
gather(var, varNum, var1:var3) %>%
group_by(rn) %>%
mutate(sum = sum(varNum, na.rm = TRUE)) %>%
spread(var, varNum)})
# user system elapsed
# 5.96 0.39 6.37
system.time({
replace(df1, is.na(df1), 0) %>% mutate(sum = var1 + var2 + var3)
})
# user system elapsed
# 0.17 0.01 0.19
system.time({
df1 %>%
mutate_each(funs(replace(., is.na(.), 0)), var1:var3) %>%
mutate(Sum = Reduce(`+`, .))
})
# user system elapsed
# 0.10 0.02 0.11
system.time({
df1 %>%
mutate(Sum = rowSums(.[names(.)[1:3]], na.rm = TRUE))
})
# user system elapsed
# 0.04 0.00 0.03
Where better = tidyr
: 哪里好 = tidyr
:
df %>%
mutate(rn = row_number()) %>%
gather(var, varNum, var1:var3) %>%
group_by(rn) %>%
mutate(sum = sum(varNum, na.rm = TRUE)) %>%
spread(var, varNum)
In case your dataset is poised to grow... 如果您的数据集准备增长...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.