有没有一种方法可以向量化R中的这个foreach循环，以使文本替换更有效？

Question

Happy to award the answer points to someone who can help me vectorize this process. 很高兴向可以帮助我向量化此过程的人员提供答案。 I'd like to search to see if a string is missing a city name and tack on the missing city name if it is indeed missing. 我想搜索字符串是否缺少城市名称，如果确实缺少，请在缺少的城市名称上添加大写字母。

Suppose I have data like this: 假设我有这样的数据：

df <- data.frame(X=c(1:5), Houston.Addresses=c("548 w 19th st", "6611 Portwest Dr. #190, houston, tx", "3555 Timmons Ln Ste 300, Houston, TX, 77027-6466", "3321 Westpark Dr", "16221 north freeway"))

I'd like data like this: 我想要这样的数据：

df.desired <- data.frame(X=c(1:5), Houston.Addresses=c("548 w 19th st, houston, tx", "6611 Portwest Dr. #190, houston, tx", "3555 Timmons Ln Ste 300, Houston, TX, 77027-6466", "3321 Westpark Dr, houston, tx", "16221 north freeway, houston, tx"))

My current method is very inefficient over large datasets, I'm sure there is a vectorization. 我当前的方法在大型数据集上效率很低，我敢肯定有矢量化方法。 Can someone assist with the vectorization of this loop?: 有人可以协助此循环的向量化吗？：

foreach(i=1:nrow(df))%do%{
  t <- tolower(df[i,"Houston.Addresses"])
  x <- grepl("houston", t)
  if(!isTRUE(x)){
    df[i, "Houston.Addresses" ] <- 
      paste0(df[i, "Houston.Addresses" ], ", houston, tx")
    }
}

Thanks in advance! 提前致谢！

Answer 1

Instead of running through each row, we create a logical index with grep (which is vectorized ) and then assign the elements of 'Houston.Addresses'that corresponds to the index 'i1' (after converting to character class) by paste ing the substring 而不是遍历每一行，我们使用grep （ vectorized ）创建逻辑索引，然后通过paste子字符串来分配与索引“ i1”相对应的“ Houston.Addresses”元素（转换为character类之后）

i1 <- !grepl("houston", tolower(df$Houston.Addresses))
df$Houston.Addresses <- as.character(df$Houston.Addresses)
df$Houston.Addresses[i1] <- paste0(df$Houston.Addresses[i1], ", houston, tx")

If we wanted to make it more efficient, we could use data.table to do the assignment ( := ) 如果我们想提高效率，可以使用data.table进行分配（ := ）

library(data.table)
setDT(df)[, Houston.Addresses := as.character(Houston.Addresses)
            ][!grepl("houston", tolower(Houston.Addresses)),
                 Houston.Addresses := paste0(Houston.Addresses, ", houston, tx")]

Answer 2

Another suggesting using ifelse 另一个建议使用ifelse

df$Houston.Addresses <- ifelse(grepl("houston", df$Houston.Addresses, ignore.case=TRUE), 
    paste0(df$Houston.Addresses, ", Houston, TX"), 
    df$Houston.Addresses)

有没有一种方法可以向量化R中的这个foreach循环，以使文本替换更有效？

问题描述

2 个解决方案

解决方案1
3 已采纳 2018-04-25 06:55:25

解决方案2
2 2018-04-25 06:56:38

有没有一种方法可以向量化R中的这个foreach循环，以使文本替换更有效？

问题描述

2 个解决方案

解决方案1 3 已采纳 2018-04-25 06:55:25

解决方案2 2 2018-04-25 06:56:38

解决方案1
3 已采纳 2018-04-25 06:55:25

解决方案2
2 2018-04-25 06:56:38