简体   繁体   English

数据帧中的R ifelse和NA

[英]R ifelse and NAs within dataframes

I have a problem with an ifelse evaluation. 我对ifelse评估有问题。

The following function evaluates based on 3 conditions: 以下函数基于3个条件进行评估:

mk <- function(a, b, c, d, e_1, e_2, f, k)
  # condition 1
  ifelse (!is.na(e_1) & !(k %in% 1),
    mk <- d - e_1 * c,
  # condition 2
    ifelse (!is.na(e_2) & !(k %in% 1),
    mk <- e_2 - d * c,
      # condition 3
        ifelse((a - b) <= 11,
          mk <- c * a - b * f,
          mk <- c * f
        ))
  )

if I parse a single element the function evaluates correctly, but if I give rows of a dataframe as input values the function only ever uses the computation in the last condition, even if the previous conditions are met. 如果我解析单个元素,函数将正确求值,但是如果我将数据帧的行作为输入值,则该函数只会在最后一个条件下使用计算,即使满足先前条件也是如此。 the columns containing the values for e_1, e_2 and k have some NA's in them, I suspect that is the problem. 包含e_1,e_2和k的值的列中有一些NA,我怀疑这是问题所在。 what I don't get is why the NA'S force the whole vector to be evaluated as condition 3, even if they are actually never used in the computation because the conditions should rule out their usage. 我不明白的是为什么NA​​'S强制将整个向量都作为条件3进行评估,即使实际上它们从未在计算中使用,因为条件应该排除它们的使用。 if I replace the calculations with characters, ie write "uses condition 1/2/3" instead of the formulas, the conditions are evaluated correctly. 如果我用字符替换计算,即写“使用条件1/2/3”而不是公式,则可以正确评估条件。

how can I avoid this problem? 我如何避免这个问题?

Turns out the NAs weren't the cause of the problem at all, but rather a rounding operation that is done after the initial evaluation. 事实证明,NA根本不是造成问题的原因,而是在初始评估后进行的舍入运算。 The round function was not in my first question since I didn't suspect it being the problem, but it is actually the cause of the problem. 舍入功能不是我的第一个问题,因为我不怀疑这是问题所在,但实际上是问题的原因。

A more simple form of my problem is represented by: 我的问题的一种更简单的形式表示为:

mktest <- function(a, b, e_1, e_2, k) {
  # condition 1
  ifelse (!is.na(e_1) & !(k %in% 1),
    mk <- 1 - e_1,
  # condition 2
    ifelse (!is.na(e_2) & !(k %in% 1),
    mk <- 2 - e_2,
      # condition 3
        ifelse((a - b) <= 1,
          mk <- -a * b,
          mk <- a * 2
        ))
  )
  round(mk,0)
  }

# some testdata with all possible combinations of values in my data frame
test <- data.frame(expand.grid(2:3, 1, c(1,NA), c(1,NA), c(0,1,NA)))
names(test)[1]    <- "a"
names(test)[2]    <- "b"
names(test)[3]    <- "e_1"
names(test)[4]    <- "e_2"
names(test)[5]    <- "k"

# visualize conditions
test$cond1 <- !is.na(test$e_1) & !(test$k %in% 1)
test$cond2 <- !is.na(test$e_2) & !(test$k %in% 1)
test$cond3 <- ((test$a - test$b) <= 1)

# results
test$result <- mktest(test$a, test$b, test$e_1, test$e_2, test$k)

If I evaluate the function without the round(mk,0) at the end it evaluates the conditions correctly. 如果我在最后没有round(mk,0)的情况下评估函数,则它会正确评估条件。 If the rounding is done, only the last condition is used. 如果舍入完成,则仅使用最后一个条件。 The reason for this behaviour is still beyond me, since the rounding operation is done AFTER the evaluation of the conditions, but at least the problem at hand is solved. 这种行为的原因仍然不在我的考虑范围内,因为四舍五入运算是在条件评估之后进行的,但是至少可以解决当前的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM