I need to do generate a new variable called Result in R, such that:
based on Variable.ID if all Classification per Variable.ID are equal to "yes", Result="yes" and if all Classification per Variable.ID are equal to "no", Result="no" else Result="undetermined"
Can anyone advise me how can I do this? (There are hundreds of Variable.IDs, so no manual vector assignments, please.)
This can be done with ave(), any(), all() etc. But the question is not good for cross validated. The following is a starter for you. You will have to change "NA" to "undeterminded" but I tried to keep the code as easy to grasp as possible:
d <- data.frame(v.id=c(1,1,1,2,2,2,3,3,3),
clas=c("yes", "yes", "yes", "yes", "yes",
"no","no","no", "no"))
d$result <- ave(d$clas, d$v.id,
FUN=function(x) {
if(all(x=="yes")){ return("yes") }
if(all(x=="no")) { return("no") }
else return(NA)
})
You can split Classification
by Variable.ID
and check for all values being either yes
or no
:
library(plyr)
results <- llply(split(d, d$Variable.ID), function(d2) {
if(all(d2$Classification=='yes')) {
'yes'
} else if(all(d2$Classification=='no')) {
'no'
} else {
'undetermined'
}
})
d$Results <- factor(unlist(results[d$Variable.ID]))
...which should give you what you asked for:
> print(d)
Variable.ID Classification Results
1 1 yes yes
2 1 yes yes
3 1 yes yes
4 1 yes yes
5 1 yes yes
6 2 no no
7 2 no no
8 2 no no
9 2 no no
10 3 yes undetermined
11 3 no undetermined
12 4 both undetermined
13 4 <NA> undetermined
14 4 yes undetermined
foo <- function(x) {
if (sum(x == "yes") == length(x)) {
return("yes")
} else if (sum(x == "no") == length(x)) {
return("no")
} else {
return("undetermined")
}
}
for (i in seq_along(data) {
data$Result[i] <- foo(data$Classification[data$Variable.ID == data$Variable.ID[i])
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.