在数据帧之间应用正则表达式和更新列

Question

我有两个数据框-表A是带有参考名称的模式表，表B是旧名称表。 我想对表B进行子集化，使其与表a中的模式匹配，并且当单元格匹配时，用B中的更新列更新B中的新列。

我已经在另一个数据帧中基于列引用了一个数据帧中的apply regexp ，但是它不能解决这种情况。

A <- data.frame(pattern = c("ab", "be|eb", "cc", "dd"), 
                ref = c("first", "second", "third", "forth"))
B <- data.frame(name = c("aa1", "bb1", "cab", "ccaa" "abed" ,"ddd", "ebba"))
B$new = ""

我希望我的结果表为：

name       new
cab        first
abed       second
ccaa       third
ddd        forth
ebba       second

我在尝试

for (i in 1:nrow(B)) {
  if (as.data.table(unlist(lapply(A$pattern, grepl, B$name))) == TRUE) {
    B$new[i] = A$update
  }
}

有谁知道更好的解决方案？ 我更喜欢使用apply family，但是我不知道如何添加列。 任何帮助表示赞赏！

Answer 1

我编辑了答案，因为我忘记先添加将B更改为矩阵的行：

B <- as.matrix(B,ncol=1)

现在应该可以正常工作了：

library(reshape2)
L <- apply(A, 1, function(x) B[grepl(x[1],B),])
names(L) <- A$ref
result <- melt(L)
colnames(result) <- c('Name','New')

    result
#  Name    New
#1  cab  first
#2 abed  first
#3 abed second
#4 ebba second
#5 ccaa  third
#6  ddd  forth

Answer 2

您可以将stack与sapply一起使用：

stack(setNames(sapply(A$pattern,grep,B$name,value=T),A$ref))

  values    ind
1    cab  first
2   abed  first
3   abed second
4   ebba second
5   ccaa  third
6    ddd  forth

您也可以使用stack(setNames(Vectorize(grep)(A$pattern,B[1],value=T),A$ref))

Answer 3

# Your data
A <- data.frame(pattern = c("ab", "be|eb", "cc", "dd"), 
            ref = c("first", "second", "third", "fourth"), stringsAsFactors = F)
B <- data.frame(name = c("aa1", "bb1", "cab", "ccaa", "abed" ,"ddd", "ebba"), stringsAsFactors = F)

patternfind <- function(i){
  ifelse(grepl(A$pattern[[i]], B$name), A$ref[[i]], NA) 
} # grepl function for your apply

m = sapply(seq_along(A$pattern), patternfind) # apply function 

test <- cbind(B,m) #bind your pattern matrix to B
melt(test, id = c("name"), value.name = "new", na.rm = T) # melt data for output

   name variable    new
3   cab        1  first
5  abed        1  first
12 abed        2 second
14 ebba        2 second
18 ccaa        3  third
27  ddd        4  fourth

如果您想走data.table路线。

library(data.table)

DT.A <- as.data.table(A) # set as data tables
DT.B <- as.data.table(B)

ab <- DT.A[, DT.B[grep(pattern, name)], by=.(pattern, new = ref)] # use grep and by, leave out pattern if don't need to see what matched
ab[,c(3,2,1)] # reorder to your desired order
ab[,3:2] # subset to remove the pattern if you decide you don't want to display it

   name    new
1:  cab  first
2: abed  first
3: abed second
4: ebba second
5: ccaa  third
6:  ddd  fourth

在数据帧之间应用正则表达式和更新列

问题描述

3 个解决方案

解决方案1
1 2018-09-17 21:06:40

解决方案2
1 2018-09-18 18:11:07

解决方案3
0 2018-09-18 17:14:39

在数据帧之间应用正则表达式和更新列

问题描述

3 个解决方案

解决方案1 1 2018-09-17 21:06:40

解决方案2 1 2018-09-18 18:11:07

解决方案3 0 2018-09-18 17:14:39

解决方案1
1 2018-09-17 21:06:40

解决方案2
1 2018-09-18 18:11:07

解决方案3
0 2018-09-18 17:14:39