简体   繁体   中英

Optimizing processing time in the nested for loops - R

I have two datasets with 24k and 15k rows. I used nested for loops in order to rewrite some data... however it takes forever to compute the operation.

does anyone have a suggestion how to optimize the code to speed the process?

my code:

for(i in 1:length(data$kolicina)){
  for(j in 1:length(df$kolicina)){
    if(data$LIXcode[i] == df$LIXcode[j]){
      data$kolicina[i] <- df$kolicina[j]
    }
  }
}

the full code with the imput looks like this:

df <- data[grepl("Trennscheiben", data$a_naziv) & data$SestavKolicina > 1,]
for(i in 1:length(df$kolicina)){
  df$kolicina[i] <- df$kolicina[i] / 10
}

for(i in 1:length(data$kolicina)){
  for(j in 1:length(df$kolicina)){
    if(data$LIXcode[i] == df$LIXcode[j]){
      data$kolicina[i] <- df$kolicina[j]
    }
  }
}

the data:

LIXcode         a_naziv                 RacunCenaNaEM   kolicina
LIX2017396957   MINI HVLP Spritzpistole   20,16           1
LIX2017396957   MINI HVLP Spritzpistole   20,16           1
LIX2017396963   Trennscheiben Ø115 Ø12    12,53           30
LIX2017396963   Trennscheiben Ø115 Ø12    12,53           1

I haven't tried this on my own machine, but this should work

fun <- function(x,y){
  x[which(x$LIXcode %in% y$LIXcode)]$kolicina = 
  y[which(x$LIXcode %in% y$LIXcode)]$kolicina
  }
}

fun(data,df)

R has the capability to do them all in parallel

As far as I understand, the question concerns table "dt1" with key column "a" and any number of value columns and any number of observations. And then we have a "dt2" that has some sort of mapping - which means that column "a" has unique values and some column "b" has values that need to be written into "dt1" where columns "a" match.

I would suggest joining tables:

require(data.table)

dt1 <- data.table(a = sample(1:10, 1000, replace = T),
           b = sample(letters, 1000, replace = T))

dt2 <- data.table(a = 1:10,
                  b = letters[1:10])

output <- merge(dt1, dt2, by = "a", all.x = T)

Also you can try:

dt1[,new_value:=dt2$b[match(a, dt2$a)]

Both of these solutions are vectorized, therefore almost instant.

Base solution (no data.table syntax, although I'd highly recommend you to learn it):

dt1$new_value <- dt2$b[match(dt1$a, dt2$a)]

And that's if I understood the question correctly...


Here's a working solution to accommodate for expected output:

dt1[a %in% dt2$a, b:=dt2$b[match(a, dt2$a)]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM