简体   繁体   中英

Fast and efficient way to loop below code in R

I want to run the below loop in an efficient way as I need to perform this on millions of rows. Sample data

a <- data.frame(x1=rep(c('a','b','c','d'),5),
                x2=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5),
                value1=c(rep(201,4),rep(202,4),rep(203,4),rep(204,4),rep(205,4)),
                y1=c(rep('a',4),rep('b',4),rep('c',4),rep('d',4),rep('e',4)),
                y2=c(1,2,3,4,2,3,4,5,3,4,5,6,4,5,6,7,5,6,7,8),
                value2=seq(101,120), stringsAsFactors = FALSE)

I wrote below to compare similar values between two columns and then find the difference.

for (i in 1:length(a$x1)){
  for (j in 1:length(a$x1)){
    if(a$y1[i] == a$x1[j] & a$y2[i] == a$x2[j]){
      a$diff[i] <- a$value1[j] - a$value2[i]
      break
    }
  }
}

For each i, you are to find the first j such that a$y1[i] == a$x1[j] && a$y2[i] == a$x2[j] (in your code, there is & instead of && which is obviously wrong).

If a$x1 , a$x2 , a$y1 , a$y2 are either numbers or character data without spaces (like in your example), you could use

x12 = paste(a$x1, a$x2)
y12 = paste(a$y1, a$y2) 

then for each i, you are to find the first j such that x12[i]==y12[j]

You do it with match(x12, y12) .

So you can do something like this:

x12 = paste(a$x1, a$x2)
y12 = paste(a$y1, a$y2) 
m = match(x12, y12)
for (i in seql(m))
    if (!is.na(m[i]))
        a$diff[i] <- a$value1[m[i]] - a$value2[i]

You can eliminate the last loop like this:

x12 = paste(a$x1, a$x2)
y12 = paste(a$y1, a$y2) 
m = match(x12, y12)
good.i = which(!is.na(m))
a$diff[good.i] <- a$value1[m[good.i]] - a$value2[good.i]   

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM