簡體   English   中英

將dataframe列與另一個dataframe列進行比較

[英]Compare dataframe column to another dataframe column

我有一個包含頁面路徑的數據框列(讓我們稱之為A):

pagePath
/text/other_text/123-string1-4571/text.html
/text/other_text/string2/15-some_other_txet.html
/text/other_text/25189-string3/45112-text.html
/text/other_text/text/string4/5418874-some_other_txet.html
/text/other_text/string5/text/some_other_txet-4157/text.html
/text/other_text/123-text-4571/text.html
/text/other_text/125-text-471/text.html

我有另一個字符串數據幀列讓我們調用它(B)(兩個數據幀不同,它們沒有相同的行數)。

這是我在數據框B中的列的示例:

names
string1
string11
string4
string3
string2
string10
string5
string100

我想要做的是檢查我的頁面路徑(A)是否包含來自我的其他數據幀(B)的字符串。

我遇到了困難,因為我的兩個數據幀長度不一樣,而且數據沒有組織。

預期輸出

我希望得到這個輸出結果:

 pagePath                                                  names     exist
/text/other_text/123-string1-4571/text.html                string1   TRUE
/text/other_text/string2/15-some_other_txet.html           string2   TRUE
/text/other_text/25189-string3/45112-text.html             string3   TRUE
/text/other_text/text/string4/5418874-some_other_txet.html string4   TRUE
/text/string5/text/some_other_txet-4157/text.html          string5   TRUE
/text/other_text/123-text-4571/text.html                     NA      FALSE
/text/other_text/125-text-471/text.html                      NA      FALSE

如果我的問題需要進一步澄清,請提及此問題。

我們可以使用grepl()生成exist

# Collapse B$names into one string with "|" 
onestring <- paste(B$names, collapse = "|") 

# Generate new column
A$exist <- grepl(onestring, A$pagePath)

不太好,因為包含for循環:

names <- rep(NA, length(A$pagePath))
exist <- rep(FALSE, length(A$pagePath))

for (name in B$names) {
  names[grep(name, A$pagePath)] <- name
  exist[grep(name, A$pagePath)] <- TRUE
}

我們可以在stringr包中使用str_extract_all ,但NA被替換為character(0)所以我們必須更改它

df$names <- as.character(str_extract_all(df$pagePath, "string[0-9]+"))
df$exist <- df$names %in% df1$names
df[df=="character(0)"] <- NA
df
#                                                 pagePath       names   exist
#1                  /text/other_text/123-string1-4571/text.html string1  TRUE
#2             /text/other_text/string2/15-some_other_txet.html string2  TRUE
#3               /text/other_text/25189-string3/45112-text.html string3  TRUE
#4   /text/other_text/text/string4/5418874-some_other_txet.html string4  TRUE
#5 /text/other_text/string5/text/some_other_txet-4157/text.html string5  TRUE
#6                     /text/other_text/123-text-4571/text.html    <NA> FALSE
#7                      /text/other_text/125-text-471/text.html    <NA> FALSE

數據

dput(df)
structure(list(pagePath = structure(c(1L, 5L, 4L, 7L, 6L, 2L, 
3L), .Label = c("/text/other_text/123-string1-4571/text.html", 
"/text/other_text/123-text-4571/text.html", "/text/other_text/125-text-471/text.html", 
"/text/other_text/25189-string3/45112-text.html", "/text/other_text/string2/15-some_other_txet.html", 
"/text/other_text/string5/text/some_other_txet-4157/text.html", 
"/text/other_text/text/string4/5418874-some_other_txet.html"), class = "factor")), .Names = "pagePath", class = "data.frame", row.names = c(NA, 
-7L))
dput(df1)
structure(list(names = structure(c(1L, 4L, 7L, 6L, 5L, 2L, 8L, 
3L), .Label = c("string1", "string10", "string100", "string11", 
"string2", "string3", "string4", "string5"), class = "factor")), .Names = "names", class = "data.frame", row.names = c(NA, 
-8L))

以下是使用apply的一種方法:

df$exist <- apply( df,1,function(x){as.logical(grepl(x[2],x[1]))} )

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM