简体   繁体   中英

Optimizing simple for loop in R

I have some lines of code with a for loop that look like this:

somevector2 <- c(length = somevector2_length)

for(string in somevector1){

  df2 <- df1[df1$col1 == string, ]
  ff <- somefunction(df2$col2)
  somevector2 <- c(somevector2, ff)

}

From what i understood initializing the vector with the correct length should make the loop faster, but it still takes quite sometimes although the somefunction(df2$col2) does some simple operations. somevector1 it's just a vector of strings

Is there a way to make this loop faster in R? thank you very much

Sorry, but that's not how you are supposed to post a question on SO. :( You should provide a working example. Also, that's not the way to create a vector of a fixed length.


Let's see a reproducible example of what you posted:

##### this makes your example reproducible

somevector1 <- unique(iris$Species)
df1 <- iris
names(df1) <- paste0("col", 5:1)
somefunction <- sum
somevector2_length <- 3



##### this is your code

# somevector2 <- c(length = somevector2_length) # <- this was wrong
somevector2 <- c()


for(string in somevector1){
 
 df2 <- df1[df1$col1 == string, ]
 ff <- somefunction(df2$col2)
 somevector2 <- c(somevector2, ff)
 
}

So this is the final result:

somevector2
#>  12.3  66.3 101.3

What I suggest you is to use this line of code down here, instead of your code. You will get a similar result (it's a NAMED numeric vector).

tapply(df1$col2, df1$col1, somefunction)
#>    setosa versicolor  virginica 
#>      12.3       66.3      101.3 

You can get rid of the names with unname()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM