[英]Vectorized Calculation in R
I was doing some calculation in R and was confused by the logic R uses. 我在R中进行一些计算,并被R使用的逻辑所迷惑。
For example, 例如,
table <- data.frame(a = c(1,NA,2,1), b= c(1,1,3,2))
Here, I am going to create the third column "c" 在这里,我将创建第三列“ c”
Column c will be 0 if column a contains NA. 如果列a包含NA,则列c将为0。 Otherwise it will be addition of column a and column b. 否则将添加列a和列b。
So the column c should be 所以列c应该是
c(2,0,5,3) c(2,0,5,3)
I wrote: 我写:
table$c <- 0
table$c[!is.na(table$a)] <- table$a + table$b
And I have column c as 我有列c作为
c(2,0,NA,5) c(2,0,NA,5)
I see that 我看到
table$c[3] = table$a[2]+table$b[2] table $ c [3] = table $ a [2] + table $ b [2]
when I wanted it to be table$c[3] = table$a[3] + table$b[3]. 当我希望它是table $ c [3] = table $ a [3] + table $ b [3]时。
I thought R would skip index number 2 in the left and right side and jump to index 3 in the calculation, but in fact, R skipped index number 2 in the left but didn't skip number 2 in the right side... 我以为R会跳过左侧和右侧的索引编号2并在计算中跳转到索引3,但实际上,R跳过了左侧的索引编号2却没有跳过右侧的索引2 ...
Why does this happen? 为什么会这样? How should I prevent this? 我应该如何预防呢?
Thank you. 谢谢。
Use 采用
table$c <- apply(table, 1, sum)
table$c[is.na(table$c)] <- 0
Or even more simple if you only start learning R: 如果您仅开始学习R,甚至更简单:
table$c <- table$a + table$b
table$c[is.na(table$c)] <- 0
In order to prevent things like in your case, don't ask R to do two things at the same time like here: 为了防止出现类似您的情况的情况,请勿要求R同时执行以下两项操作:
table$c[!is.na(table$a)] <- table$a + table$b
You basically asked R to check if c contains NA 'on the fly', and it's not how R is working. 您基本上是要求R检查c是否包含“即时”的NA,而不是R的工作方式。
Alternatively, you could make use of the data.table
package 或者,您可以使用data.table
包
library(data.table)
table <- data.table(a = c(1,NA,2,1), b= c(1,1,3,2))#creates the data table structure
table[,c:=ifelse(is.na(a),0,a+b)]#creates the column c based on the condition
> table
a b c
1: 1 1 2
2: NA 1 0
3: 2 3 5
4: 1 2 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.