繁体   English   中英

在数据帧之间应用正则表达式和更新列

[英]apply regexp and update column across data frames

我有两个数据框-表A是带有参考名称的模式表,表B是旧名称表。 我想对表B进行子集化,使其与表a中的模式匹配,并且当单元格匹配时,用B中的更新列更新B中的新列。

我已经在另一个数据帧中基于列引用了一个数据帧中的apply regexp ,但是它不能解决这种情况。

A <- data.frame(pattern = c("ab", "be|eb", "cc", "dd"), 
                ref = c("first", "second", "third", "forth"))
B <- data.frame(name = c("aa1", "bb1", "cab", "ccaa" "abed" ,"ddd", "ebba"))
B$new = ""

我希望我的结果表为:

name       new
cab        first
abed       second
ccaa       third
ddd        forth
ebba       second

我在尝试

for (i in 1:nrow(B)) {
  if (as.data.table(unlist(lapply(A$pattern, grepl, B$name))) == TRUE) {
    B$new[i] = A$update
  }
}

有谁知道更好的解决方案? 我更喜欢使用apply family,但是我不知道如何添加列。 任何帮助表示赞赏!

我编辑了答案,因为我忘记先添加将B更改为矩阵的行:

B <- as.matrix(B,ncol=1) 

现在应该可以正常工作了:

library(reshape2)
L <- apply(A, 1, function(x) B[grepl(x[1],B),])
names(L) <- A$ref
result <- melt(L)
colnames(result) <- c('Name','New')

    result
#  Name    New
#1  cab  first
#2 abed  first
#3 abed second
#4 ebba second
#5 ccaa  third
#6  ddd  forth

您可以将stack与sapply一起使用:

stack(setNames(sapply(A$pattern,grep,B$name,value=T),A$ref))

  values    ind
1    cab  first
2   abed  first
3   abed second
4   ebba second
5   ccaa  third
6    ddd  forth

您也可以使用stack(setNames(Vectorize(grep)(A$pattern,B[1],value=T),A$ref))

# Your data
A <- data.frame(pattern = c("ab", "be|eb", "cc", "dd"), 
            ref = c("first", "second", "third", "fourth"), stringsAsFactors = F)
B <- data.frame(name = c("aa1", "bb1", "cab", "ccaa", "abed" ,"ddd", "ebba"), stringsAsFactors = F)

patternfind <- function(i){
  ifelse(grepl(A$pattern[[i]], B$name), A$ref[[i]], NA) 
} # grepl function for your apply

m = sapply(seq_along(A$pattern), patternfind) # apply function 

test <- cbind(B,m) #bind your pattern matrix to B
melt(test, id = c("name"), value.name = "new", na.rm = T) # melt data for output

   name variable    new
3   cab        1  first
5  abed        1  first
12 abed        2 second
14 ebba        2 second
18 ccaa        3  third
27  ddd        4  fourth

如果您想走data.table路线。

library(data.table)

DT.A <- as.data.table(A) # set as data tables
DT.B <- as.data.table(B)

ab <- DT.A[, DT.B[grep(pattern, name)], by=.(pattern, new = ref)] # use grep and by, leave out pattern if don't need to see what matched
ab[,c(3,2,1)] # reorder to your desired order
ab[,3:2] # subset to remove the pattern if you decide you don't want to display it

   name    new
1:  cab  first
2: abed  first
3: abed second
4: ebba second
5: ccaa  third
6:  ddd  fourth

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM