比ifelse更有效的比较数字的方法？

Question

Consider simple data: 考虑简单数据：

    > cbind(x,y)
           x  y
    [1,]  -1 99
    [2,]   5  4
    [3,]  10 -2
    [4,] 600  0
    [5,] -16  1
    [6,]   0 55

Now consider this simple nested ifelse statement: 现在考虑这个简单的嵌套ifelse语句：

ifelse(y>=0, ifelse(x<0,y,ifelse(x>y,y,x)), x)

Which gives me a result of: 这给了我一个结果：

[1] 99  4 10  0  1  0

It should be easy to see what the code does: it replaces values in x with either: 应该很容易看出代码的作用：它用x替换x中的值：

1) smaller value in y if both x,y are non-negative 1）如果x，y都是非负的，则y中的值越小

2) any non-negative value of y if x is negative 2）如果x为负，则y的任何非负值

or leaves x alone. 或者单独留下x。

My question is: this code is not very computationally efficient, can you think of any way to code this efficiently? 我的问题是：这段代码的计算效率不是很高，你能想出任何有效编码的方法吗？ Thanks! 谢谢！

Answer 1

You can use that x is the sum and y is the difference of (x+y)/2 and (xy)/2 . 你可以使用x是和， y是(x+y)/2和(xy)/2 。 Then calculate with logical expressions ( TRUE equals 1 and FALSE equals 0): 然后用逻辑表达式计算（ TRUE等于1， FALSE等于0）：

(x+y + (1+2*((x<=y)*(x>=0)-(y>=0)))*(x-y))/2

gives the same result as the nested ifelse -expression. 给出与嵌套ifelse相同的结果。 Speed comparison, using vectors of length 500: 速度比较，使用长度为500的向量：

> set.seed(1)

> x <- sample(-100:100,500,replace=TRUE)

> y <- sample(-100:100,500,replace=TRUE)

> system.time(
+   for ( i in 1:100000 )
+   {
+     A <- (x+y + (1+2*((x<=y)*(x>=0)-(y>=0)))*(x-y))/2
+   }
+ )
   user  system elapsed 
   8.46    0.00    8.51 

> system.time(
+   for ( i in 1:100000 )
+   {
+     B <- ifelse(y>=0, ifelse(x<0,y,ifelse(x>y,y,x)), x)
+   }
+ )
   user  system elapsed 
  74.58    0.03   75.05 

> system.time(
+   for ( i in 1:100000 )
+   {
+     z <- y
+     z[(x < y & x >= 0)| y < 0] <- x[(x < y & x >= 0)| y < 0];z
+   }
+ )
   user  system elapsed 
  23.32    0.00   23.44

Check if the results are the same: 检查结果是否相同：

> all(A==B)
[1] TRUE
> all(A==z)
[1] TRUE
>

Answer 2

Another option without indexing: 没有索引的另一个选项：

x * ((x < y & x >= 0) | y < 0) + y * ((x > y & y >= 0) | x < 0)

Output: 输出：

[1] 99  4 10  0  1  0

Time comparison, it seems mra68 answer is the fastest: 时间比较，似乎mra68答案是最快的：

library(microbenchmark)
microbenchmark(
  TylerRinker = z[(x < y & x >= 0)| y < 0] <- x[(x < y & x >= 0)| y < 0],
  mra68 =(x+y + (1+2*((x<=y)*(x>=0)-(y>=0)))*(x-y))/2,
  mpalanco = x *((x < y & x >= 0)| y < 0)+ y * ((x > y & y >= 0)| x < 0),
  if_else = ifelse(y>=0, ifelse(x<0,y,ifelse(x>y,y,x)), x)
  ) 

   Unit: microseconds
        expr    min      lq     mean median     uq     max neval cld
 TylerRinker  8.800  9.7780 11.47480 10.267 10.268  75.778   100  a 
       mra68  5.867  6.3560  9.40188  6.845  7.334 214.623   100  a 
    mpalanco  7.334  7.8230  8.67836  8.311  8.800  30.312   100  a 
     if_else 44.489 45.9565 54.61929 53.289 53.290 245.911   100   b

Answer 3

Maybe just indexing. 也许只是索引。 I don't know if it's any more efficient: 我不知道它是否更有效率：

dat <- read.table(text="       x  y
    [1,]  -1 99
    [2,]   5  4
    [3,]  10 -2
    [4,] 600  0
    [5,] -16  1
    [6,]   0 55", header=TRUE)



x <- dat[, 1]
y <- dat[, 2]

z <- y
z[(x < y & x >= 0)| y < 0] <- x[(x < y & x >= 0)| y < 0];z

## 99  4 10  0  1  0

Answer 4

This is more of a summary of the above answers than a unique answer; 这是对上述答案的总结，而不是一个独特的答案; but I do provide time comparisons. 但我确实提供时间比较。

b is a small speed up by combining operations. 通过组合操作， b是一个小的加速。 ce are all previously provided answers. ce都是以前提供的答案。 @mra68's answer appears the fastest @ mra68的答案看起来最快

library(microbenchmark)
microbenchmark(a= ifelse(y>=0, ifelse(x<0,y,ifelse(x>y,y,x)), x),
               b= {ifelse(y>= 0, ifelse(x>y | x<0, y,x), x)},
               c= {z <- y; z[(x < y & x >= 0)| y < 0] <- x[(x < y & x >= 0)| y < 0];z},
               d= x * ((x < y & x >= 0) | y < 0) + y * ((x > y & y >= 0) | x < 0),
               e= (x+y + (1+2*((x<=y)*(x>=0)-(y>=0)))*(x-y))/2)

Unit: microseconds
 expr    min      lq     mean median     uq    max neval cld
    a 16.346 18.6270 21.88066 19.387 20.528 77.548   100   c
    b 10.644 11.4040 13.05781 11.785 12.545 39.154   100  b 
    c  3.801  4.1820  5.10146  4.562  4.942 18.247   100 a  
    d  3.041  3.4210  4.37168  3.801  3.802 33.452   100 a  
    e  2.281  2.8515  3.36810  3.041  3.421 18.246   100 a

Though, IMO, the lack of readability in the fastest solution isn't worth the speedup. 虽然，IMO，在最快的解决方案中缺乏可读性并不值得加速。

Depending on the actual use case, you could achieve a speedup by ordering your if-else operations such that the minimum number of operations pass further down the call stack. 根据实际使用情况，您可以通过订购if-else操作来实现加速，以便最小数量的操作在调用堆栈中向下传递。

比ifelse更有效的比较数字的方法？

问题描述

4 个解决方案

解决方案1
3 2015-10-21 21:21:54

解决方案2
3 已采纳 2015-10-21 21:42:06

解决方案3
2 2015-10-21 18:20:58

解决方案4
1 2015-10-21 22:46:42

比ifelse更有效的比较数字的方法？

问题描述

4 个解决方案

解决方案1 3 2015-10-21 21:21:54

解决方案2 3 已采纳 2015-10-21 21:42:06

解决方案3 2 2015-10-21 18:20:58

解决方案4 1 2015-10-21 22:46:42

解决方案1
3 2015-10-21 21:21:54

解决方案2
3 已采纳 2015-10-21 21:42:06

解决方案3
2 2015-10-21 18:20:58

解决方案4
1 2015-10-21 22:46:42