简体   繁体   中英

Is there a way in R to combine matrices of different sizes by rows AND columns?

I have a list of matrices in R. Each of the matrices has row and column names - sometimes overlapping with other row and column names in other matrices in the list. For example

Mat1 <- as.matrix( read.table(text="Col1 Col2 Col3
Row1     0  0   0
Row2     1  0   5
Row3     5  2   0", head=TRUE))

Mat2<- as.matrix( read.table(text="Col1 Col3 Col4
Row2     0  0   0
Row3     1  0   5
Row4     5  2   0",head=TRUE))

How can I combine all the matrices in the list such that (1) where the rows & columns intersect, the numbers are added together? (2) where the rows & columns do not intersect, the value from the original matrix is preserved?

All the examples I've found online (eg using the 'merge' function) are focused on only column merging or only row merging.

EDIT: As many have pointed out, I did not provide a reproducible example in code - adding below using the 'dput' function

list(structure(c(0, 1, 5, 0, 0, 2, 0, 5, 0), .Dim = c(3L, 3L), .Dimnames = list(
    c("Row1", "Row2", "Row3"), c("Col1", "Col2", "Col3"))), structure(c(0, 
1, 5, 0, 0, 2, 0, 5, 0), .Dim = c(3L, 3L), .Dimnames = list(c("Row2", 
"Row3", "Row4"), c("Col1", "Col3", "Col4"))))
##
#   this is a minimal reproducible example
#   ###    YOU should provide this     ###
#
m1 <- matrix(c(0,0,0,1,0,5,5,2,0), nc=3, byrow = TRUE, 
             dimnames = list(c('row.1', 'row.2', 'row.3'), c('col.1', 'col.2', 'col.3')))
m2 <- matrix(c(0,0,0,1,0,5,5,2,0), nc=3, byrow = TRUE, 
             dimnames = list(c('row.2', 'row.3', 'row.4'), c('col.1', 'col.3', 'col.4')))
##
#   you start here
#
library(data.table)
m  <- rbind(melt(as.data.table(m1, keep.rownames = T), id='rn'),
            melt(as.data.table(m2, keep.rownames = T), id='rn'))
m[is.na(value), value:=0]
dcast(m, rn~variable, fun.aggregate = sum)
##       rn col.1 col.2 col.3 col.4
## 1: row.1     0     0     0     0
## 2: row.2     1     0     5     0
## 3: row.3     6     2     0     5
## 4: row.4     5     0     2     0

The question states that the input is a list of matrices so assume that that list is L shown below where Mat1 and Mat2 are shown in the question. Then convert each matrix to a long form data frame whose columns are the row names, column names and the value column. These columns are named Var1, Var2 and Freq. Then rbind the individual data frames together and use xtabs to create the two dimensional layout and finally convert that to a matrix.

The question did not specify the following -- see the Variations note at the end for alternatives:

  1. how to deal with cells that are not in any matrix in the list so we have used 0
  2. the order of the rows and columns so we assume that they should be in sorted order of their names.

No packages are used.

L <- list(Mat1, Mat2)

long <- do.call("rbind", lapply(L, as.data.frame.table)) 
m <- as.matrix(as.data.frame.matrix(xtabs(Freq ~., long)))

m

giving:

     Col1 Col2 Col3 Col4
Row1    0    0    0    0
Row2    1    0    5    0
Row3    6    2    0    5
Row4    5    0    2    0

Variations

  1. Although it gives the same result in the example data in the question a potentially different ordering of rows and columns could be obtained by using the following after obtaining m:

     rn <- do.call("union", lapply(L, rownames)) cn <- do.call("union", lapply(L, colnames)) m[rn, cn]
  2. An alternative to the xtabs line is the following. Omit default=0 if cells not in any matrix should be NA. The second line omits the Var1 and Var2 dimension names and could be omitted if they are to be preserved.

     m <- tapply(long[[3]], long[-3], sum, default = 0) names(dimnames(m)) <- NULL # optional

    Omitting the second line above would show the dimension names like this:

     Var2 Var1 Col1 Col2 Col3 Col4 Row1 0 0 0 0 Row2 1 0 5 0 Row3 6 2 0 5 Row4 5 0 2 0
  3. The solution could be written as a pipeline like this:

     L |> lapply(as.data.frame.table) |> do.call(what = "rbind") |> xtabs(formula = Freq ~.) |> as.data.frame.matrix() |> as.matrix()

@jlhoward's answer shows you how to do it with the data.table package, and @GGrothendieck's used some fancy base functions. Here's another way using simple base functions.

Mat1 <- as.matrix( read.table(text="Col1 Col2 Col3
Row1     0  0   0
Row2     1  0   5
Row3     5  2   0", head=TRUE))

Mat2<- as.matrix( read.table(text="Col1 Col3 Col4
Row2     0  0   0
Row3     1  0   5
Row4     5  2   0",head=TRUE))

# Get the row and column names 

rn1 <- rownames(Mat1)
rn2 <- rownames(Mat2)
cn1 <- colnames(Mat1)
cn2 <- colnames(Mat2)

# Construct row and column names for the sum matrix
rnsum <- unique(c(rn1, rn2))
cnsum <- unique(c(cn1, cn2))

# Make the matrix of zeros
sum <- matrix(0, length(rnsum), length(cnsum),
              dimnames = list(rnsum, cnsum))

# Put all indices of each matrix into a matrix
# with column 1 being the row name, column 2 being the 
# column name, and add the results into the sum

ind <- cbind(rn1[row(Mat1)], cn1[col(Mat1)])
sum[ind] <- sum[ind] + Mat1[ind]

ind <- cbind(rn2[row(Mat2)], cn2[col(Mat2)])
sum[ind] <- sum[ind] + Mat2[ind]

sum
#>      Col1 Col2 Col3 Col4
#> Row1    0    0    0    0
#> Row2    1    0    5    0
#> Row3    6    2    0    5
#> Row4    5    0    2    0

Created on 2022-05-08 by the reprex package (v2.0.1)

If the matrices were actually in a list (eg thelist <- list(Mat1, Mat2) ), then I'd just put all of this code into a loop, eg

sum <- matrix(0, 0, 0)
for (i in seq_along(thelist)) {
   Mat1 <- sum
   Mat2 <- thelist[[i]]
   
   ... same code as above ...
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM