[英]How to change values in a column while checking them against another column?
Here's an example of my data: 这是我的数据的示例:
essay ns0_nns1 A_pred B_pred A_pred01 B_pred01
1 1 1 0.558 0.370 NA NA
2 2 0 0.293 0.654 NA NA
3 3 0 0.545 0.849 NA NA
4 4 0 0.432 0.698 NA NA
5 5 1 0.651 0.404 NA NA
6 6 0 0.657 0.502 NA NA
7 7 1 0.884 0.658 NA NA
8 8 1 0.736 0.348 NA NA
9 9 0 0.532 0.791 NA NA
10 10 0 0.180 0.789 NA NA
I need to go through and if A_pred is <= 0.5, then the corresponding row in A_pred01 should be assigned 0, else it should be assigned 1. 我需要检查一下,如果A_pred <= 0.5,则A_pred01中的对应行应分配为0,否则应分配为1。
I thought I could do this with a for loop, so I came up with: 我以为可以通过for循环来做到这一点,所以我想到了:
for(i in dat$A_pred){
if(i<=0.5){
dat$A_pred01[i]=0
} else {
dat$A_pred01[i]=1}
}
This didn't work though. 不过这没用。 I guess what I need to know is, can I somehow have a placeholder for A_pred01 that corresponds to i, and that's changing each A_pred01 value as it goes along in the for loop? 我想我需要知道的是,我能以某种方式为与i对应的A_pred01占位符,并且随着for循环中的变化而改变每个A_pred01值吗? I hope what I'm asking makes sense, thanks. 希望我的要求有意义,谢谢。
If you would like to fix the loop try changing the i
counter into a numeric vector ( 1 2 3 4 5 ...
) instead of the values of the column. 如果要修复循环,请尝试将i
计数器更改为数字矢量( 1 2 3 4 5 ...
),而不是列的值。 Your original code didn't work because i
was a value like .558
. 您的原始代码无效,因为i
的值是.558
。 So when you run dat$A_pred01[i]
you were inputting the decimal in there. 因此,当您运行dat$A_pred01[i]
您在其中输入了小数。 So it ran dat$A_pred01[0.558]
which wasn't what you were expecting to do. 因此它运行了dat$A_pred01[0.558]
,这不是您期望的。
for(i in 1:nrow(dat)){
if(dat$A_pred[i]<=0.5){
dat$A_pred01[i]=0
} else {
dat$A_pred01[i]=1}
}
Vectorized 向量化
You can also avoid the loop altogether with: 您还可以通过以下方式完全避免循环:
dat$A_pred01 <- as.integer(dat$A_pred > 0.5)
The expression dat$A_pred > 0.5
is a logical vector indicating if each element satisfies the condition ( TRUE FALSE FALSE ...
). 表达式dat$A_pred > 0.5
是一个逻辑向量,指示每个元素是否满足条件( TRUE FALSE FALSE ...
)。 We then coerce it to 1's and 0's with as.integer
. 然后,使用as.integer
将其强制为1和0。
# essay ns0_nns1 A_pred B_pred A_pred01 B_pred01
# 1 1 1 0.558 0.370 1 NA
# 2 2 0 0.293 0.654 0 NA
# 3 3 0 0.545 0.849 1 NA
# 4 4 0 0.432 0.698 0 NA
# 5 5 1 0.651 0.404 1 NA
# 6 6 0 0.657 0.502 1 NA
# 7 7 1 0.884 0.658 1 NA
# 8 8 1 0.736 0.348 1 NA
# 9 9 0 0.532 0.791 1 NA
# 10 10 0 0.180 0.789 0 NA
data.table 数据表
As your data sets get larger you may want to include data.table
into your workflow. 随着数据集变大,您可能希望将data.table
包含在工作流中。 Here is the same operation with that syntax: 这是与该语法相同的操作:
library(data.table)
setDT(dat)[, A_pred01 := as.integer(dat$A_pred > 0.5)]
Bonus 奖金
Instead of as.integer(dat$A_pred > 0.5)
try the shorter +(dat$A_pred > 0.5)
. 代替as.integer(dat$A_pred > 0.5)
尝试较短的+(dat$A_pred > 0.5)
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.