在R中使用ifelse语句

Question

I would like to use the ifelse statement to create a new variable, say, z. 我想使用ifelse语句创建一个新变量，比如z。 However, one of the return values depends on the i-th column of a matrix. 但是，其中一个返回值取决于矩阵的第i列。 Here is a simple example 这是一个简单的例子

set.seed(1)
data <- data.frame(x = rnorm(10), y = rnorm(10), ind = rep(c(0, 1), 5))
m <- data.frame(matrix(rnorm(100), 10, 10))

z <- ifelse(data$ind == 1, data$x, sum(m[, i]))

I know the line with z won't run, but it illustrates what I would like to do. 我知道z行不会运行，但它说明了我想做的事情。 If a subject has the ind variable equal to 0, then I assign to z the sum of the 10 entries in m corresponding to subject i's column. 如果一个主题的ind变量等于0，那么我将z分配给m对应于主题i列的10个条目的总和。

Could I do this with ifelse, or would I need a for loop? 我可以用ifelse做这个，还是需要for循环？ I'm trying to stay away from for loops, which is why I am trying ifelse in the first place. 我试图远离for循环，这就是为什么我首先尝试ifelse。

Here is what z should look like: 这是z应该是什么样子：

z
 [1] -1.3367324  0.1836433  1.3413668  1.5952808  4.5120996 -0.8204684  1.2736029
 [8]  0.7383247  3.4748021 -0.3053884

Thanks! 谢谢！

Answer 1

Yes you can do it with ifelse and a one-liner, very close to what you wrote: 是的，你可以使用ifelse和一个单行，与你写的非常接近：

z <- ifelse(data$ind == 0, colSums(m), data$x)

Here is what R does when it executes this statement: 以下是R执行此语句时的作用：

it computes the boolean vector data$ind == 0 , and stores into memory the two numeric vectors colSums(m) and data$x 它计算布尔矢量data$ind == 0 ，并将两个数值向量colSums(m)和data$x存储到内存中
where (data$ind == 0) is True , it outputs colSums(m) ; 其中(data$ind == 0)为True ，输出colSums(m) ; where (data$ind == 0) is False , it outputs data$x 其中(data$ind == 0)为False ，它输出data$x

Answer 2

Or we can use arithmetic 或者我们可以使用算术

colSums(m)*(data$ind==0) + (data$ind==1)*data$x
#     X1         X2         X3         X4         X5         X6         X7 
#-1.3367324  0.1836433  1.3413668  1.5952808  4.5120996 -0.8204684  1.2736029 
#        X8         X9        X10 
# 0.7383247  3.4748021 -0.3053884

Answer 3

You can do it in a two-liner instead: 你可以用双线代替它：

z <- data$x
z[data$ind == 0] <- colSums(m[,data$ind == 0])

 [1] -1.3367324  0.1836433  1.3413668  1.5952808  4.5120996 -0.8204684  1.2736029  0.7383247  3.4748021
[10] -0.3053884

more generally, you could use an apply function. 更一般地说，您可以使用apply函数。 This will in general be slower than a straight vectorised solution, like the above. 这通常比直接矢量化解决方案慢，如上所述。 Here's sapply: 这是开心的：

sapply(1:nrow(data), function(x){ifelse(data$ind[x] == 1, data$x[x], sum(m[, x]))})

 [1] -1.3367324  0.1836433  1.3413668  1.5952808  4.5120996 -0.8204684  1.2736029  0.7383247  3.4748021
[10] -0.3053884

A benchmark: 基准：

microbenchmark::microbenchmark(
     sapply = sapply(1:nrow(data), function(x){ifelse(data$ind[x] == 1, data$x[x], sum(m[, x]))}), 
     vectorised = {z <- data$x;
                   z[data$ind == 0] <- colSums(m[,data$ind == 0])})
Unit: microseconds
       expr     min      lq     mean   median       uq     max neval cld
     sapply 391.297 408.193 423.6525 412.4170 423.7450 853.249   100   b
 vectorised 197.377 199.873 208.7701 202.5605 214.4645 284.545   100  a

在R中使用ifelse语句

问题描述

3 个解决方案

解决方案1
4 2015-10-14 02:06:33

解决方案2
3 2015-10-14 02:09:58

解决方案3
2 已采纳 2015-10-14 01:35:38

在R中使用ifelse语句

问题描述

3 个解决方案

解决方案1 4 2015-10-14 02:06:33

解决方案2 3 2015-10-14 02:09:58

解决方案3 2 已采纳 2015-10-14 01:35:38

解决方案1
4 2015-10-14 02:06:33

解决方案2
3 2015-10-14 02:09:58

解决方案3
2 已采纳 2015-10-14 01:35:38