I need to assign a new column, with multiple possible values based on multiple conditions. Example Data
a1 a2 a3 a4 a5 a6 a7 a8 a9
NA 1 NA 2 7 8 9 1 1
7 7 7 7 7 7 7 7 7
6 6 6 6 6 6 5 5 5
So I might have rules for example: if a1 to a9 contain 1 or 2 then return 1, otherwise, return 7. or if a1 to 19 contain 5 or 6, return a 6, otherwise 3. I have a number of these rules so need something that could accommodate.
Required outcome
a1 a2 a3 a4 a5 a6 a7 a8 a9 NEW
NA 1 NA 2 7 8 9 1 1 1
7 7 7 7 7 7 7 7 7 7
6 6 6 6 6 6 5 5 5 6
I have tried assigning with subsetting ie
df$NEW <- 7
df$NEW[df$a1==1 | df$a2==1 | df$a3==1] <- 1
df$NEW[df$a4==1 | df$a5==1 | df$a6==1] <- 1
df$NEW[df$a7==1 | df$a8==1 | df$a9==1] <- 1
df$NEW[df$a1==7 | df$a2==7 | df$a3==7] <- 7
df$NEW[df$a1==5 | df$a2==5 | df$a3==5] <- 6
df$NEW[df$a1==6 | df$a2==6 | df$a3==6] <- 6
Which I'm aware is clunky, but works to a point. Once there are multiple values / conditions however, not all values are filled correctly (returns maybe 2 out of 3+ desired / assigned values). FOr the 'otherwise' rule I have used !=
as well as >
or <
. I've also attempted using ifelse
but with the same effect.
I'm also aware the solution is going to be relatively simple and staring me in the face but I'd be grateful for you to signpost me to a reasonable solution.
If there's anything you want me to clarify, just let me know.
Thanks in advance.
There is a vectorised if statement in dplyr
that can help you called case_when
:
library(dplyr)
df <- read.table(text = 'a1 a2 a3 a4 a5 a6 a7 a8 a9
NA 1 NA 2 7 8 9 1 1
7 7 7 7 7 7 7 7 7
6 6 6 6 6 6 5 5 5', header = T)
df %>%
mutate(
NEW = case_when(
a1 == 1 | a2 == 1 | a3 == 1 ~ 1,
a1==1 | a2==1 | a3==1 ~ 1,
a4==1 | a5==1 | a6==1 ~ 1,
a7==1 | a8==1 | a9==1 ~ 1,
a1==7 | a2==7 | a3==7 ~ 7,
a1==5 | a2==5 | a3==5 ~ 6,
a1==6 | a2==6 | a3==6 ~ 6
)
)
The conditions are placed on the left hand side of ~
and the result you want on the right hand side.
Returns:
a1 a2 a3 a4 a5 a6 a7 a8 a9 NEW
1 NA 1 NA 2 7 8 9 1 1 1
2 7 7 7 7 7 7 7 7 7 7
3 6 6 6 6 6 6 5 5 5 6
Here's an idea which works with multiple rules. But your example is not clear, what's happen in a line without 1,2,5 and 6 ? 7 or 3 ?
Anyway, here an idea adaptable based on: 1 or 2 -> 1 ; 5 or 6 -> 6 (supposed 1 or 2 and 5 or 6 can not be mixed) ; otherwise -> 7
df$new <- 7
for (i in 1:nrow(df)) {
if (1 %in% as.numeric(df[i,]) | 2 %in% as.numeric(df[i,] )) {
df[i,]$new <- 1
}
else if (5 %in% as.numeric(df[i,]) | 6 %in% as.numeric(df[i,] )) {
df[i,]$new <- 6
}
}
df
You could use apply
function instead of the loop
Here you go... everything should be well explained in that (base r) loop. You would only need to spend some time creating a coefficients file in order to generalize this to other data. You would also have to tweak a bit when your conditions will change ( & instead of |, < instead of = etc.)
df <-data.frame(matrix(c(NA, 1, NA, 2, 7, 8, 9, 1, 1,7, 7, 7, 7, 7, 7, 7, 7, 7,6, 6, 6, 6, 6, 6, 5, 5, 5),
nrow=3, ncol=9, byrow=T))
colnames(df) = c("a1", "a2", "a3", "a4", "a5", "a6", "a7", "a8", "a9" )
nbconditions <- 6
condition <- matrix(NA, nrow=nrow(df) , ncol= nbconditions)
# you could read.xlsx an already prepared coefficient matrix here
coefficients <- matrix(NA, nrow= ncol(df) , ncol=nbconditions )
coefficients[c(1,2,3),1] <- 1
coefficients[c(4,5,6),2] <- 1
coefficients[c(7,8,9),3] <- 1
coefficients[c(1,2,3),4] <- 7
coefficients[c(1,2,3),5] <- 5
coefficients[c(1,2,3),6] <- 6
results <- c(1,1,1,7,6,6)
NEW <- rep(NA, nrow(df))
for(i in 1:nrow(df)) {
found <- F
for(j in nbconditions:1) { #condition checking from least priority to most priority
if(!found) {
indicestocheck <- which(!is.na(coefficients[,j]))
if(sum(is.na(df[i,indicestocheck]))==length(indicestocheck)) {
NEW[i] <- NA
} else {
checks <- (coefficients[,j] == df[i,indicestocheck])
#print(checks)
if( sum(is.na(checks)) < length(checks) & 1<=sum(checks[which(!is.na(checks))])) {
NEW[i] <- results[j]
found <- T
print(paste(j,"found",results[i]))
}
}
}
}
}
df$NEW <- NEW
df
> df
a1 a2 a3 a4 a5 a6 a7 a8 a9 NEW
1 NA 1 NA 2 7 8 9 1 1 1
2 7 7 7 7 7 7 7 7 7 7
3 6 6 6 6 6 6 5 5 5 6
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.