[英]dplyr if_else() vs base R ifelse()
I am fairly proficient within the Tidyverse, but have always used ifelse()
instead of dplyr if_else()
.我对 Tidyverse 相当精通,但一直使用ifelse()
而不是 dplyr if_else()
。 I want to switch this behavior and default to always using dplyr::if_else()
and deprecating ifelse()
from my code.我想切换此行为并默认始终使用dplyr::if_else()
并从我的代码中弃用ifelse()
。
Is there any reason not to do this?有什么理由不这样做吗? Would this likely get me into trouble?这可能会给我带来麻烦吗? I'll spare you the details, but recently, not using if_else()
screwed me up, when I unknowingly created a column of character matrices in my data analysis.我会为您省去细节,但最近,当我在数据分析中不知不觉地创建了一列字符矩阵时,不使用if_else()
把我搞砸了。 If I switch to always using if_else()
I hope to avoid this issue in the future.如果我切换到总是使用if_else()
我希望将来避免这个问题。
if_else
is more strict. if_else
更严格。 It checks that both alternatives are of the same type and otherwise throws an error, while ifelse
will promote types as necessary.它检查两个选项是否属于相同类型,否则会抛出错误,而ifelse
将根据需要提升类型。 This may be a benefit in some circumstances, but may otherwise break scripts if you don't check for errors or explicitly force type conversion.这在某些情况下可能是一个好处,但如果您不检查错误或明确强制类型转换,则可能会破坏脚本。 For example:例如:
ifelse(c(TRUE,TRUE,FALSE),"a",3)
[1] "a" "a" "3"
if_else(c(TRUE,TRUE,FALSE),"a",3)
Error: `false` must be type character, not double
Another reason to choose if_else
over ifelse
is that ifelse
turns Date
into numeric
objects选择if_else
不是ifelse
另一个原因是ifelse
将Date
转换为numeric
对象
Dates <- as.Date(c('2018-10-01', '2018-10-02', '2018-10-03'))
new_Dates <- ifelse(Dates == '2018-10-02', Dates + 1, Dates)
str(new_Dates)
#> num [1:3] 17805 17807 17807
if_else
is also faster than ifelse
. if_else
也比ifelse
快。
Note that when testing multiple conditions, the code would be more readable and less error-prone if we use case_when
.请注意,在测试多个条件时,如果我们使用case_when
,代码将更具可读性且不易出错。
library(dplyr)
case_when(
Dates == '2018-10-01' ~ Dates - 1,
Dates == '2018-10-02' ~ Dates + 1,
Dates == '2018-10-03' ~ Dates + 2,
TRUE ~ Dates
)
#> [1] "2018-09-30" "2018-10-03" "2018-10-05"
Created on 2018-06-01 by the reprex package (v0.2.0).由reprex 包(v0.2.0) 于2018年 6 月 1 日创建。
I'd also add that if_else()
can attribute a value in case of NA
, which is a handy way of adding an extra condition.我还要补充一点, if_else()
可以在NA
情况下赋予一个值,这是添加额外条件的一种方便的方法。
df <- data_frame(val = c(80, 90, NA, 110))
df %>% mutate(category = if_else(val < 100, 1, 2, missing = 9))
# val category
# <dbl> <dbl>
# 1 80 1
# 2 90 1
# 3 NA 9
# 4 110 2
Another important reason for preferring if_else()
to ifelse()
is checking for consistency in lengths.比ifelse()
更喜欢if_else()
另一个重要原因是检查长度的一致性。 See this dangerous gotcha:看到这个危险的陷阱:
> tibble(x = 1:3, y = ifelse(TRUE, x, 4:6))
# A tibble: 3 x 2
x y
<int> <int>
1 1 1
2 2 1
3 3 1
Compare with与之比较
> tibble(x = 1:3, y = if_else(TRUE, x, 4:6))
Error: `true` must be length 1 (length of `condition`), not 3.
The intention in both cases is clearly for column y
to equal x
or to equal 4:6
acording to the value of a single (scalar) logical variable;两种情况下的意图显然都是根据单个(标量)逻辑变量的值,使y
列等于x
或等于4:6
; ifelse()
silently truncates its output to length 1, which is then silently recycled. ifelse()
静默将其输出截断为长度 1,然后静默回收。 if_else()
catches what is almost certainly an error at source. if_else()
从源头捕获几乎可以肯定的错误。
Sometimes I prefer ifelse
because it does not evaluate the false statement.有时我更喜欢ifelse
因为它不会评估错误的陈述。 When you raise error if the condition is not TRUE, you have to use a simple if
, or ifelse
.如果条件不是 TRUE,则当您引发错误时,您必须使用简单的if
或ifelse
。
Example:例子:
df <- data.frame(a = c(1, 2))
df %>% mutate(shp = ifelse(
length(a) >= 3,
round(shapiro.test(a)[["p.value"]], 3L),
NA_real_
))
a shp
1 1 NA
2 2 NA
df %>% mutate(shp = if_else(
length(a) >= 3,
round(shapiro.test(a)[["p.value"]], 3L),
NA_real_
))
Error in `mutate()`:
! Problem while computing `shp = if_else(...)`.
Caused by error in `shapiro.test()`:
! sample size must be between 3 and 5000
Run `rlang::last_error()` to see where the error occurred.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.