简体   繁体   中英

Using apply functions instead of for loops in R

I have been trying to replace a for loop in my code with an apply function, and i attempted to do it in all the possible ways, using sapply and lapply and apply and mapply, always seems to not work out, the original function looks like this

ds1 <- data.frame(col1 = c(NA, 2), col2 = c("A", "B"))
ds2 <- data.frame(colA = c("A", "B"), colB = c(90, 110))

for(i in 1:nrow(ds1)){
  if(is.na(ds1$col1[i])){
    ds1$col1[i] <- ds2[ds2[,"colA"] == ds1$col2[i], "colB"]
  }
}

My latest attempt with the apply family looks like this

ds1 <- data.frame(col1 = c(NA, 2), col2 = c("A", "B"))
ds2 <- data.frame(colA = c("A", "B"), colB = c(90, 110))

sFunc <- function(x, y, z){
  if(is.na(x)){
    return(z[z[,"colA"] == y, "colB"])
  } else {
    return(x)
  }
}

ds1$col1 <- sapply(ds1$col1, sFunc, ds1$col2, ds2)

Which returns ds2$colB for each row, can someone explain to me what I got wrong about this?

sapply only iterates over the first vector you pass. The other arguments you pass will be treated as whole vectors in each loop. To iterate over m ultiple vectors you need m ultivariate apply, which is m apply.

sFunc <- function(x, y){
  if(is.na(x)){
    return(ds2[ds2[,"colA"] == y, "colB"])
  } else {
    return(x)
  }
}

mapply(sFunc, ds1$col1, ds1$col2)
#> [1] 90  2

A join would be useful here. You can do it in base R :

transform(merge(ds1, ds2, by.x = "col2", by.y = "colA"), 
          col1 = ifelse(is.na(col1), colB, col1))[names(ds1)]

#  col1 col2
#1   90    A
#2    2    B

Or with dplyr

library(dplyr)

inner_join(ds1, ds2, by = c("col2" = "colA")) %>%
    mutate(col1 = coalesce(col1, colB)) %>%
    select(names(ds1))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM