简体   繁体   中英

Splitting a matrix by both rows and columns using asplit() in R

Consider the following fake example where I extract all comparisons corresponding to a name called A from a matrix called matr .

### Set up example matrix ###
matr <- matrix(c(2,0,3,0,5,0.7,1,0,0.9,6,11,9,0,1,0.5,2,0,1,0.3,3,6,1,0.31,0,0),     nrow = 5, ncol = 5)
dimnames(matr) = list(c("A", "B", "A", "C", "A"),  c("A", "B", "A", "C", "A"))
matr

# Pretend the matrix is symmetric - for my real matrix, it is
matr[upper.tri(matr, diag = TRUE)] <- NA # gwt lower triangle
matr

for (rowLoopCounter in 1:nrow(matr)){

  #Get the row of interest
  matr_work <- matr[rowLoopCounter,,drop=FALSE]

  for (colLoopCounter in 1:nrow(matr)) {
    if (row.names(matr)[rowLoopCounter] == colnames(matr)[colLoopCounter]){
      matr[rowLoopCounter, colLoopCounter] <- NA
    }
  }
}

A_row <- c(matr[grepl("A", row.names(matr)), ]) # get comparisions in row
sA_col <- c(matr[, grepl("A", colnames(matr))]) # get comparisions in columns
total <- as.numeric(na.omit(unlist(c(_A_row, A_col)))) # combine results

total
#[1] 0 6 3 0 0 1

The above implementation is quite verbose, but only gets the job done for A . I need to also do this for B and C .

This can be done using a for loop (or apply() ).

I naively tried using split() , which only works on vectors and gives strange results (leaves out the values 1 in A and puts it in C for some reason):

splt <- split(matr, colnames(matr)) # using rownames(matr) is equivalent

#$A
#[1] NA NA NA NA  0  6 NA NA NA NA NA  3 NA NA NA

#$B
#[1]  0 NA NA NA NA

#$C
#[1] 0.0 0.9 1.0  NA  NA

$A$ should contain the same elements as total .

I recently discovered the new asplit() function, but I get an error

asplit(matr, c(1, 2))
#Error in array(newx[, i], d.call, dn.call) : 'dims' cannot be of length 0

What I would like from asplit() is a similar output returned by split() where values are stored in named lists. However, from running the examples in the documentation for asplit() , there's no way to do this.

one liner would be:

with(na.omit(as.data.frame.table(matr)), split(c(Freq, Freq), c(Var1, Var2)))

$A
[1] 0 6 3 0 0 1

$B
[1] 0.0 0.0 0.9 6.0

$C
[1] 0.0 0.9 1.0 3.0

not entirely sure what you want to achieve - but if it is a list of vectors, per letter, containing all matrix values where row and column letter coincide, you can do this:

library(dplyr) ## for convenient dataframe manipulation
df <- 
  cbind(
  expand.grid(row = dimnames(matr)[[1]],
              col = dimnames(matr)[[2]]),
  value = as.vector(matr)
)
#  > head(df)
#    row col value
# 1   A   A   2.0
# 2   B   A   0.0
# 3   A   A   3.0
# 4   C   A   0.0
# 5   A   A   5.0
# 6   A   B   0.7

filter above df for coinciding row and column letters, and summarise per letter:

df <- df |>
  filter(row == col) |>
  group_by(row) |>
  summarise(total = list(value))

convert to named list:

totals = setNames(df$total, df$row)

output:

## > totals
## $A
## [1]  2.00  3.00  5.00 11.00  0.00  0.50  6.00  0.31  0.00
## 
## $B
## [1] 1
## 
## $C
## [1] 0.3

You can use split() on both the column and row by swapping the rownames of which(.is,na(matr). arr.ind=T) . Then use mapply() to combine your two lists.

#Get index of matr by its array index, removing NA values
ind<- which(!is.na(matr), arr.ind=T)

#Create a list by factor of row names.
list_1<- split(x = matr[ind], f = row.names(ind))

#Then substitute the column name as the row name. 
row.names(ind)<- colnames(matr)[unname(ind[,2])]

#Create a second list by factor of column name.
list_2<- split(x = matr[ind], f = row.names(ind))
    
#Combine your lists
mapply(c, list_1, list_2)

Output of the mapply() :

$A
[1] 0 6 3 0 0 1

$B
[1] 0.0 0.0 0.9 6.0

$C
[1] 0.0 0.9 1.0 3.0

This is a followup to @SEAnalyst's great answer.

Since split() works on vectors, this is equivalent to rows in a matrix. This would be the output of list_1 . To generate list_2 , you could simply t ranspose matr along with colnames() .

The entire implementation would then be

# Create a list by factor of row names.
list_1 <- split(matr, colnames(matr))

# Create a second list by factor of column name.
list_2 <- split(t(matr), colnames(t(matr)))

# Combine your lists
splt <- mapply(c, list_1, list_2)

# Get rid of NAs
splt <- lapply(splt, function(x) x[!is.na(x)])

Easier to understand I think.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM