R中列表中的字符串匹配

Question

我正在嘗試使用grep在R中執行字符串匹配。 我必須將df1 $ ColA與df2 $ ColA匹配，我在下面給出了輸入和輸出：

輸入：

df1：

ColA
text1
text2
text3
text4
text5
text6
text7

df2：

ColA
text1 text2 text12
text23 text22 text7

中間輸出：

ColA                    ColB
text1 text2 text12     text1, text2
text23 text22 text7    text7

最終輸出：

ColA                ColB
text1 text2 text12   text1
text1 text2 text12   text2
text23 text22 text7  text7

方法：

我目前正在使用

test$test <- sapply(df2$ColA, function(x) ifelse(grep(paste(as.character(unlist(df1$ColA)),collapse="|"),x),1,0))

如果df1 $ ColA字符串與df2 $ ColA匹配，但不會返回匹配的字符串，它將給我。 請指教。

Answer 1

這可以幫助您：

df2 <- matrix(sample(LETTERS)[-1], nrow=5)
df2 <- apply(df2, 1, FUN=function(x) paste(x, collapse=' '))

data <- data.frame(a=LETTERS[1:5], b=df2) ; data

df2 <- sapply(1:nrow(data), function(x) strsplit(as.character(data$b[x]), ' '))

sapply(1:nrow(data), function(x) which(data$a[x] == df2[[x]]))

sapply(1:nrow(data), function(x) data$a[x] == df2[[x]])

Answer 2

這是一個基於match()的半向量化解決方案，該解決方案應該速度很快，並且可以完全生成您想要的東西。 匹配df1$ColA中的項目的方法是將df2$ColA並將df1$ColA與每個標記進行匹配。 然后，它將建立整個（原始） df2$ColA元素的重復，並在輸出中將df1$ColA匹配項作為ColB添加。

# set up the data, which the OP should have done
df1 <- data.frame(ColA = paste0("text", 1:7),
                  stringsAsFactors = FALSE)
df2 <- data.frame(ColA = c("text1 text2 text12",
                           "text23 text22 text7"),
                  stringsAsFactors = FALSE)

# create a matrix of matches of first to elements of second
matmatrix <- sapply(strsplit(df2$ColA, " "), match, df1$ColA)
# repeat original text in same length as potential match
origdfColArep <- rep(df2$ColA, each = nrow(matmatrix))

# create the results dataset, first the matches of the second part
result <- data.frame(ColA = origdfColArep[!is.na(as.vector(matmatrix))],
                     stringsAsFactors = FALSE)
# then add the matching first part
result$ColB <- df1$ColA[na.omit(as.vector(matmatrix))]

result
##                  ColA  ColB
## 1  text1 text2 text12 text1
## 2  text1 text2 text12 text2
## 3 text23 text22 text7 text7

R中列表中的字符串匹配

問題描述

輸入：

df1：

df2：

中間輸出：

最終輸出：

方法：

2 個解決方案

解決方案1
0 2016-04-18 12:51:53

解決方案2
0 2016-04-18 18:53:14

R中列表中的字符串匹配

問題描述

輸入：

df1：

df2：

中間輸出：

最終輸出：

方法：

2 個解決方案

解決方案1 0 2016-04-18 12:51:53

解決方案2 0 2016-04-18 18:53:14

解決方案1
0 2016-04-18 12:51:53

解決方案2
0 2016-04-18 18:53:14