將dataframe列與另一個dataframe列進行比較

Question

我有一個包含頁面路徑的數據框列（讓我們稱之為A）：

pagePath
/text/other_text/123-string1-4571/text.html
/text/other_text/string2/15-some_other_txet.html
/text/other_text/25189-string3/45112-text.html
/text/other_text/text/string4/5418874-some_other_txet.html
/text/other_text/string5/text/some_other_txet-4157/text.html
/text/other_text/123-text-4571/text.html
/text/other_text/125-text-471/text.html

我有另一個字符串數據幀列讓我們調用它（B）（兩個數據幀不同，它們沒有相同的行數）。

這是我在數據框B中的列的示例：

names
string1
string11
string4
string3
string2
string10
string5
string100

我想要做的是檢查我的頁面路徑（A）是否包含來自我的其他數據幀（B）的字符串。

我遇到了困難，因為我的兩個數據幀長度不一樣，而且數據沒有組織。

預期輸出

我希望得到這個輸出結果：

 pagePath                                                  names     exist
/text/other_text/123-string1-4571/text.html                string1   TRUE
/text/other_text/string2/15-some_other_txet.html           string2   TRUE
/text/other_text/25189-string3/45112-text.html             string3   TRUE
/text/other_text/text/string4/5418874-some_other_txet.html string4   TRUE
/text/string5/text/some_other_txet-4157/text.html          string5   TRUE
/text/other_text/123-text-4571/text.html                     NA      FALSE
/text/other_text/125-text-471/text.html                      NA      FALSE

如果我的問題需要進一步澄清，請提及此問題。

Answer 1

我們可以使用grepl()生成exist列

# Collapse B$names into one string with "|" 
onestring <- paste(B$names, collapse = "|") 

# Generate new column
A$exist <- grepl(onestring, A$pagePath)

Answer 2

不太好，因為包含for循環：

names <- rep(NA, length(A$pagePath))
exist <- rep(FALSE, length(A$pagePath))

for (name in B$names) {
  names[grep(name, A$pagePath)] <- name
  exist[grep(name, A$pagePath)] <- TRUE
}

Answer 3

我們可以在stringr包中使用str_extract_all ，但NA被替換為character(0)所以我們必須更改它

df$names <- as.character(str_extract_all(df$pagePath, "string[0-9]+"))
df$exist <- df$names %in% df1$names
df[df=="character(0)"] <- NA
df
#                                                 pagePath       names   exist
#1                  /text/other_text/123-string1-4571/text.html string1  TRUE
#2             /text/other_text/string2/15-some_other_txet.html string2  TRUE
#3               /text/other_text/25189-string3/45112-text.html string3  TRUE
#4   /text/other_text/text/string4/5418874-some_other_txet.html string4  TRUE
#5 /text/other_text/string5/text/some_other_txet-4157/text.html string5  TRUE
#6                     /text/other_text/123-text-4571/text.html    <NA> FALSE
#7                      /text/other_text/125-text-471/text.html    <NA> FALSE

數據

dput(df)
structure(list(pagePath = structure(c(1L, 5L, 4L, 7L, 6L, 2L, 
3L), .Label = c("/text/other_text/123-string1-4571/text.html", 
"/text/other_text/123-text-4571/text.html", "/text/other_text/125-text-471/text.html", 
"/text/other_text/25189-string3/45112-text.html", "/text/other_text/string2/15-some_other_txet.html", 
"/text/other_text/string5/text/some_other_txet-4157/text.html", 
"/text/other_text/text/string4/5418874-some_other_txet.html"), class = "factor")), .Names = "pagePath", class = "data.frame", row.names = c(NA, 
-7L))
dput(df1)
structure(list(names = structure(c(1L, 4L, 7L, 6L, 5L, 2L, 8L, 
3L), .Label = c("string1", "string10", "string100", "string11", 
"string2", "string3", "string4", "string5"), class = "factor")), .Names = "names", class = "data.frame", row.names = c(NA, 
-8L))

Answer 4

以下是使用apply的一種方法：

df$exist <- apply( df,1,function(x){as.logical(grepl(x[2],x[1]))} )

將dataframe列與另一個dataframe列進行比較

問題描述

4 個解決方案

解決方案1
2 2016-03-11 14:47:56

解決方案2
2 2016-03-11 14:52:01

解決方案3
2 2016-03-11 15:20:00

解決方案4
0 2016-03-11 14:51:17

將dataframe列與另一個dataframe列進行比較

問題描述

4 個解決方案

解決方案1 2 2016-03-11 14:47:56

解決方案2 2 2016-03-11 14:52:01

解決方案3 2 2016-03-11 15:20:00

解決方案4 0 2016-03-11 14:51:17

解決方案1
2 2016-03-11 14:47:56

解決方案2
2 2016-03-11 14:52:01

解決方案3
2 2016-03-11 15:20:00

解決方案4
0 2016-03-11 14:51:17