简体   繁体   中英

Conditional replacement of column values in a dataframe using R

Let's make a dummy dataset

ll = data.frame(rbind(c(2,3,5), c(3,4,6), c(9,4,9)))
colnames(ll)<-c("b", "c", "a")
> ll
  b c a
1 2 3 5
2 3 4 6
3 9 4 9

P = data.frame(cbind(c(3,5), c(4,6), c(8,7)))
colnames(P)<-c("a", "b", "c")
> P
  a b c
1 3 4 8
2 5 6 7

I want to create a new dataframe where the values in each column of ll would be turned into 0 when it is less than corresponding values of a,b, & c in the first row of P; in other words, I'd like to see

> new_ll
  b c a
1 0 0 5
2 0 0 6
3 9 0 9

so I tried it this way

nn=c("a", "b", "c")
new_ll = sapply(nn, function(i) 
  ll[,paste0(i)][ll[,paste0(i)] < P[,paste0(i)][1]] <- 0)

But it doesn't work for some reason! I must be doing a silly mistake in my script!! Any idea?

> new_ll
a b c 
0 0 0 

You can find the values in ll that are smaller than the first row of P with an apply :

t(apply(ll, 1, function(x) x<P[1,][colnames(ll)]))
      [,1] [,2]  [,3]
[1,]  TRUE TRUE FALSE
[2,]  TRUE TRUE FALSE
[3,] FALSE TRUE FALSE

Here, the first row of P is ordered to match ll , then the elements are compared.

Credit to Ananda Mahto for recognizing that apply is not required:

ll < c(P[1, names(ll)])
         b    c     a
[1,]  TRUE TRUE FALSE
[2,]  TRUE TRUE FALSE
[3,] FALSE TRUE FALSE

The TRUE values show where you want to substitute with 0:

ll[ ll < c(P[1, names(ll)]) ] <- 0
ll
  b c a
1 0 0 5
2 0 0 6
3 9 0 9

To fix your code, you want something like this:

do.call(cbind, lapply(names(ll), function(i) {
    ll[,i][ll[,i] < P[,i][1]] <- 0
    return(ll[i])}))
  b c a
1 0 0 5
2 0 0 6
3 9 0 9

What's changed? First, sapply is changed to lapply and the function returns a vector for each iteration. Second, the names are presented in the correct order for the expected results. Third, the results are put together with cbind to get the final matrix. As a bonus, the redundant calls to paste0 have been removed.

You could also try mapply , which applies the function to the each corresponding element. Here, the ll and P are both data.frames . So, it applies the function for each column and does the recycling also. Here, I matched the column names of P with that of ll (similar to @Matthew Lundberg) and looked for which elements of ll in each column is < than the corresponding column (the one row of P gets recycled) and returns a logical index. Then the elements that matches the logical condition are assigned to 0 .

indx <- mapply(`<`, ll, P[1,][names(ll)])
new_ll <- ll
new_ll[indx] <- 0
new_ll
 #  b c a
 #1 0 0 5
 #2 0 0 6
 #3 9 0 9

In case you know that ll and P are numeric you can do it also as

llm <- as.matrix(ll)
pv <- as.numeric(P[1, colnames(llm)])  
llm[sweep(llm, 2, pv, `<=`)] <- 0
data.frame(llm)
#   b c a
# 1 0 0 5
# 2 0 0 6
# 3 9 0 9

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM