简体   繁体   中英

How do I get the intersect of two matrices?

# These are the two matrices that I would like to subset based on identical
# entries within entire rows.
mata <- matrix(c("A", "B", "C", "F", "D", "E", "F", "G"), 
               nrow = 4, ncol = 2,
               dimnames = list(c(), c("A", "B")))
mata

##      A   B  
## [1,] "A" "D"
## [2,] "B" "E"
## [3,] "C" "F"
## [4,] "F" "G"

matb <- matrix(c("B", "A", "C", "F", "M", "D", "D", "H", "G", "X"), 
               nrow = 5, ncol = 2,
               dimnames = list(c(), c("A", "B")))
matb

##      A   B  
## [1,] "B" "D"
## [2,] "A" "D"
## [3,] "C" "H"
## [4,] "F" "G"
## [5,] "M" "X"

If the two matrices were not unordered and of the same length, the following code should work and would be efficient.

mata[rowMeans(mata == matb) == 1, ]

A hackish solution of mine would be the concatenation of the individual columns of each matrix that I want to use for the matching. In this example I will use all columns.

mata <- cbind(mata, C = paste0(mata[, "A"], "_", mata[, "B"]))
matb <- cbind(matb, C = paste0(matb[, "A"], "_", matb[, "B"]))
mata[mata[, "C"] %in% matb[, "C"], colnames(mata) != "C"]

##      A   B  
## [1,] "A" "D"
## [2,] "F" "G"

This is the result that I am looking for, but I am wondering whether there is something more elegant such as the %in% function for vectors.

Edit

The solution should apply to general cases where the matrices are not necessarily of equal length.

You could use the function merge() for this:

> merge(mata,matb)
  A B
1 A D
2 F G

If you load dplyr, intersect.data.frame is added:

library(dplyr)
options(stringsAsFactors=FALSE)
dfa <- as.data.frame(mata)
dfb <- as.data.frame(matb)
intersect(dfa,dfb)

#   A B
# 1 A D
# 2 F G

Similarly, union , setequal (testing set equality) and setdiff (set minus) are available.


Aside. Each row of a data.frame corresponds to an observation, so it makes sense to talk about intersecting two sets of observations (two data.frames). For matrices, however, it really does not make sense. That's why hacks like the OP's solution and @RHertel's (which coerces to data.frame behind the scenes) are needed for this operation if you want to continue using matrices.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM