简体   繁体   中英

More efficient than for loop in R

I wonder are there more efficient ways to assign values to a new variable in a data frame, than using for loops. I have two recent example:

[1] Getting normalized Leveshtein distance using vwr package:

rst34$Levenshtein = rep(0, nrow(rst34))
for (i in 1:nrow(rst34)) {
    rst34$Levenshtein[i] = levenshtein.distance(
    as.character(rst34$target[i]), as.character(rst34$prime[i]))[[1]] /
    max(nchar(as.character(rst34$target[i])), nchar(as.character(rst34$prime[i]))
    )
}

[2] Extracting substring from another variable:

rst34$Experiment = 'rst4'
for (i in 1:nrow(rst34)) {
    rst34$Experiment[i] = unlist(strsplit(as.character(rst34$subject[i]), '[.]'))[1]
}

Also, I think that there should be no difference between initializations in two examples:

rst34$Levenshtein = rep(0, nrow(rst34))
rst34$Experiment = 'rst4'

Many thanks!

Something like...

rst34$Experiment = sapply(rst34$subject, function(element){
    unlist(strsplit(as.character(element), '[.]'))[1]
})

Should hopefully do the trick. I don't have your data, so I couldn't actually test it out.

It would only make sense to apply nchar to a character variable so the as.character calls are probably not needed:

     rst34$Levenshtein <- 
           levenshtein.distance( rst34$target, rst34$prime) /
                                            pmax(nchar(rst34$target), 
                                                 nchar(rst34$prime ) )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM