R：按兩列匹配行

Question

我目前正試圖找出一個矢量化的方式來匹配同一行中的兩個值。 我有以下兩個簡化數據框：

# Dataframe 1: Displaying all my observations
df1 <- data.frame(c(1, 2, 3, 4, 5, 6, 7, 8),
                  c("A", "B", "C", "D", "A", "B", "A", "C"), 
                  c("B", "E", "D", "A", "C", "A", "D", "A"))
colnames(df1) <- c("ID", "Number1", "Number2")

> df1
  ID Number1 Number2
1  1       A       B
2  2       B       E
3  3       C       D
4  4       D       A
5  5       A       C
6  6       B       A
7  7       A       D
8  8       C       A

# Dataframe 2: Matrix of observations I am interested in
df2 <- matrix(c("A", "B",
                "D", "A",
                "C", "B",
                "E", "D"),
              ncol = 2,
              byrow = TRUE)

> df2
     [,1] [,2]
[1,] "A"  "B" 
[2,] "D"  "A" 
[3,] "C"  "B" 
[4,] "E"  "D"

我想要完成的是在df1中創建一個新列，只有當df2中存在精確組合時才會聲明為TRUE（例如ID = 1等於df2中的第一行，因為它們都包含A和B）。 另外，如果有快捷方式，如果數字相反，我也希望狀態為TRUE，即df1 $ Number1匹配df2 [i，2]和df1 $ Number2匹配df2 [i，1]（例如ID = 7，df1中的組合是A，D，在df2中，組合是D，A - > TRUE）。

我想要的輸出如下：

> df1
  ID Number1 Number2 Status
1  1       A       B   TRUE
2  2       B       E  FALSE
3  3       C       D  FALSE
4  4       D       A   TRUE
5  5       A       C  FALSE
6  6       B       A  TRUE
7  7       A       D  TRUE
8  8       C       A  FALSE

到目前為止我得到的只是：

for (i in 1:nrow(df1)) {
  for (j in 1:nrow(df2)) {
    Status <- ifelse(df1$Number1[i] %in% df2[j,1] && 
                     df1$Number2[i] %in% df2[j,2], TRUE, FALSE)
    StatusComb[i,j] <- Status
  }
  df1$Status[i] <- ifelse(any(StatusComb[i,]) == TRUE, TRUE, FALSE)
}

它真的效率很低（你可以清楚地告訴我我是R的新手）並且看起來也不是很好。 我將不勝感激任何幫助！

Answer 1

一種方法是merge事物merge在一起。

調整數據，考慮反轉標簽，我將自己反轉df2並對其進行調整：

df2 <- rbind.data.frame(df2, df2[,c(2,1)])
colnames(df2) <- c("Number1", "Number2")
df2$a <- TRUE
df2
#   Number1 Number2    a
# 1       A       B TRUE
# 2       D       A TRUE
# 3       C       B TRUE
# 4       E       D TRUE
# 5       B       A TRUE
# 6       A       D TRUE
# 7       B       C TRUE
# 8       D       E TRUE

我添加a ，它將被合並。從那里：

df3 <- merge(df1, df2, all.x = TRUE)
df3$a <- !is.na(df3$a)
df3[ order(df3$ID), ]
#   Number1 Number2 ID     a
# 1       A       B  1  TRUE
# 5       B       E  2 FALSE
# 7       C       D  3 FALSE
# 8       D       A  4  TRUE
# 2       A       C  5 FALSE
# 4       B       A  6  TRUE
# 3       A       D  7  TRUE
# 6       C       A  8 FALSE

如果您之前查看它!is.na(df3$a) ，您會看到該列完全為TRUE且NA （在df2中不存在NA ）; 如果這對您來說足夠了，那么您可以省略中間步驟。 order步驟只是因為沒有保證merge行順序（事實上我發現它總是不方便的不同）。 由於它之前是按ID排序的，所以我還原到了它，但這完全符合美學要求，以匹配您想要的輸出。

Answer 2

您可以按字母順序定義要搜索的combination變量，如下所示：

combination <- apply(df2, 1, function(x) {
  paste(sort(x), collapse = '')
})
combination
[1] "AB" "AD" "BC" "DE"

然后根據Number字段的串聯改變Status字段

library(dplyr)
df1 %>%
  rowwise() %>%
  mutate(S = paste(sort(c(Number1, Number2)), collapse = "")) %>%
  mutate(Status = ifelse(S %in% combination, TRUE, FALSE))
Source: local data frame [8 x 5]
Groups: <by row>

# A tibble: 8 x 5
     ID Number1 Number2 S     Status
  <dbl> <chr>   <chr>   <chr> <lgl> 
1     1 A       B       AB    TRUE  
2     2 B       E       BE    FALSE 
3     3 C       D       CD    FALSE 
4     4 D       A       AD    TRUE  
5     5 A       C       AC    FALSE 
6     6 B       A       AB    TRUE  
7     7 A       D       AD    TRUE  
8     8 C       A       AC    FALSE

數據：

我在數據幀中設置了stringsAsFactors = F

df1 <- data.frame(c(1, 2, 3, 4, 5, 6, 7, 8),
                    c("A", "B", "C", "D", "A", "B", "A", "C"), 
                    c("B", "E", "D", "A", "C", "A", "D", "A"), stringsAsFactors = F)
colnames(df1) <- c("ID", "Number1", "Number2")

R：按兩列匹配行

問題描述

2 個解決方案

解決方案1
0 2019-03-23 18:07:11

解決方案2
0 2019-03-23 18:14:07

數據：

R：按兩列匹配行

問題描述

2 個解決方案

解決方案1 0 2019-03-23 18:07:11

解決方案2 0 2019-03-23 18:14:07

數據：

解決方案1
0 2019-03-23 18:07:11

解決方案2
0 2019-03-23 18:14:07