简体   繁体   中英

Generating new variable in R based on group properties

I need to do generate a new variable called Result in R, such that:

based on Variable.ID if all Classification per Variable.ID are equal to "yes", Result="yes" and if all Classification per Variable.ID are equal to "no", Result="no" else Result="undetermined"

在此处输入图片说明

Can anyone advise me how can I do this? (There are hundreds of Variable.IDs, so no manual vector assignments, please.)

This can be done with ave(), any(), all() etc. But the question is not good for cross validated. The following is a starter for you. You will have to change "NA" to "undeterminded" but I tried to keep the code as easy to grasp as possible:

d <- data.frame(v.id=c(1,1,1,2,2,2,3,3,3),
           clas=c("yes", "yes", "yes", "yes", "yes",
                  "no","no","no", "no"))

d$result <- ave(d$clas, d$v.id, 
            FUN=function(x) {
              if(all(x=="yes")){ return("yes") }
              if(all(x=="no")) { return("no") }
              else return(NA)
            })

You can split Classification by Variable.ID and check for all values being either yes or no :

library(plyr)
results <- llply(split(d, d$Variable.ID), function(d2) {
if(all(d2$Classification=='yes')) {
    'yes'
} else if(all(d2$Classification=='no')) {
    'no'
} else {
    'undetermined'
}
})
d$Results <- factor(unlist(results[d$Variable.ID]))

...which should give you what you asked for:

> print(d)

   Variable.ID Classification      Results
1            1            yes          yes
2            1            yes          yes
3            1            yes          yes
4            1            yes          yes
5            1            yes          yes
6            2             no           no
7            2             no           no
8            2             no           no
9            2             no           no
10           3            yes undetermined
11           3             no undetermined
12           4           both undetermined
13           4           <NA> undetermined
14           4            yes undetermined
foo <- function(x) {
  if (sum(x == "yes") == length(x)) {
    return("yes")
  } else if (sum(x == "no") == length(x)) {
    return("no")
  } else {
    return("undetermined")
  }
}

for (i in seq_along(data) {
  data$Result[i] <- foo(data$Classification[data$Variable.ID == data$Variable.ID[i])
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM