[英]Is there a way to vectorize this foreach loop in R to make text replacement more efficient?
很高兴向可以帮助我向量化此过程的人员提供答案。 我想搜索字符串是否缺少城市名称,如果确实缺少,请在缺少的城市名称上添加大写字母。
假设我有这样的数据:
df <- data.frame(X=c(1:5), Houston.Addresses=c("548 w 19th st", "6611 Portwest Dr. #190, houston, tx", "3555 Timmons Ln Ste 300, Houston, TX, 77027-6466", "3321 Westpark Dr", "16221 north freeway"))
我想要这样的数据:
df.desired <- data.frame(X=c(1:5), Houston.Addresses=c("548 w 19th st, houston, tx", "6611 Portwest Dr. #190, houston, tx", "3555 Timmons Ln Ste 300, Houston, TX, 77027-6466", "3321 Westpark Dr, houston, tx", "16221 north freeway, houston, tx"))
我当前的方法在大型数据集上效率很低,我敢肯定有矢量化方法。 有人可以协助此循环的向量化吗?:
foreach(i=1:nrow(df))%do%{
t <- tolower(df[i,"Houston.Addresses"])
x <- grepl("houston", t)
if(!isTRUE(x)){
df[i, "Houston.Addresses" ] <-
paste0(df[i, "Houston.Addresses" ], ", houston, tx")
}
}
提前致谢!
而不是遍历每一行,我们使用grep
( vectorized
)创建逻辑索引,然后通过paste
子字符串来分配与索引“ i1”相对应的“ Houston.Addresses”元素(转换为character
类之后)
i1 <- !grepl("houston", tolower(df$Houston.Addresses))
df$Houston.Addresses <- as.character(df$Houston.Addresses)
df$Houston.Addresses[i1] <- paste0(df$Houston.Addresses[i1], ", houston, tx")
如果我们想提高效率,可以使用data.table
进行分配( :=
)
library(data.table)
setDT(df)[, Houston.Addresses := as.character(Houston.Addresses)
][!grepl("houston", tolower(Houston.Addresses)),
Houston.Addresses := paste0(Houston.Addresses, ", houston, tx")]
另一个建议使用ifelse
df$Houston.Addresses <- ifelse(grepl("houston", df$Houston.Addresses, ignore.case=TRUE),
paste0(df$Houston.Addresses, ", Houston, TX"),
df$Houston.Addresses)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.