[英]More efficient than for loop in R
I wonder are there more efficient ways to assign values to a new variable in a data frame, than using for loops. 我想知道是否有比使用for循环更有效的方法将值分配给数据帧中的新变量。 I have two recent example:
我有两个最近的例子:
[1] Getting normalized Leveshtein distance using vwr package: [1]使用vwr包获取标准化的Leveshtein距离:
rst34$Levenshtein = rep(0, nrow(rst34))
for (i in 1:nrow(rst34)) {
rst34$Levenshtein[i] = levenshtein.distance(
as.character(rst34$target[i]), as.character(rst34$prime[i]))[[1]] /
max(nchar(as.character(rst34$target[i])), nchar(as.character(rst34$prime[i]))
)
}
[2] Extracting substring from another variable: [2]从另一个变量中提取子字符串:
rst34$Experiment = 'rst4'
for (i in 1:nrow(rst34)) {
rst34$Experiment[i] = unlist(strsplit(as.character(rst34$subject[i]), '[.]'))[1]
}
Also, I think that there should be no difference between initializations in two examples: 另外,我认为两个示例之间的初始化应该没有区别:
rst34$Levenshtein = rep(0, nrow(rst34))
rst34$Experiment = 'rst4'
Many thanks! 非常感谢!
Something like... 就像是...
rst34$Experiment = sapply(rst34$subject, function(element){
unlist(strsplit(as.character(element), '[.]'))[1]
})
Should hopefully do the trick. 希望能做到这一点。 I don't have your data, so I couldn't actually test it out.
我没有您的数据,因此我无法进行实际测试。
It would only make sense to apply nchar to a character variable so the as.character calls are probably not needed: 将nchar应用于字符变量只会有意义,因此可能不需要as.character调用:
rst34$Levenshtein <-
levenshtein.distance( rst34$target, rst34$prime) /
pmax(nchar(rst34$target),
nchar(rst34$prime ) )
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.