简体   繁体   中英

Extracting rows and columns of a matrix if row names and column names have a partial match

I will give an example of my problem using a smaller matrix. Say I have a matrix with row names and column names such as this:

set.seed(10)

a <- matrix(rexp(200), ncol=9,nrow = 3)
colnames(a) <- paste(rep(c("aaa" , "bbb" , "ccc") , each = 3) , rep(c(1:3) , times = 3) , sep = "")
rownames(a) <- c("aaa" , "bbb" , "ccc")

giving matrix a :

          aaa1      aaa2      aaa3      bbb1      bbb2       bbb3      ccc1      ccc2      ccc3
aaa 0.01495641 1.5750419 2.3276229 0.6722683 1.3165471 1.63298388 1.7447187 0.3469224 1.3981074
bbb 0.92022120 0.2316586 0.7291238 0.4265298 0.4132938 0.07119408 0.2929501 0.7950826 1.1104594
ccc 0.75215894 1.0866730 1.2883101 1.1154219 0.6765753 2.56885161 0.6453052 1.3962992 0.1704216

I would like to find an efficient code that matches the row names with each column name without the digit, returning a vector. In this case:

      aaa1       aaa2       aaa3       bbb1       bbb2       bbb3       ccc1       ccc2       ccc3 
0.01495641 1.57504185 2.32762287 0.42652979 0.41329383 0.07119408 0.64530516 1.39629918 0.17042160 

I obtained the previous matrix using this code:

b <- c(a[grepl("aaa" , rownames(a)) , grepl("aaa" , colnames(a))] ,
       a[grepl("bbb" , rownames(a)) , grepl("bbb" , colnames(a))] ,
       a[grepl("ccc" , rownames(a)) , grepl("ccc" , colnames(a))] )

Is there a way to do this efficiently, even if the matrix is much larger and possibly has a different name structure than this?

An easier option is to reshape to 'long' by converting to data.frame from table , and then subset the rows based on the values of 'Var1' and 'Var2'

out <- subset(as.data.frame.table(a), Var1 == sub("\\d+", "", Var2),
     select =c(Var2, Freq))
with(out, setNames(Freq, Var2))
    aaa1       aaa2       aaa3       bbb1       bbb2       bbb3       ccc1       ccc2       ccc3 
0.01495641 1.57504185 2.32762287 0.42652979 0.41329383 0.07119408 0.64530516 1.39629918 0.17042160 

Or with row/column indexing

i1 <- match( sub("\\d+", "", colnames(a)), rownames(a))
a[cbind(i1, seq_along(i1))]
[1] 0.01495641 1.57504185 2.32762287 0.42652979 0.41329383 0.07119408 0.64530516 1.39629918 0.17042160

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM