简体   繁体   中英

Mapping content of one matrix onto structure of another matrix

I have two matrices sourced from the same dataset but with different amounts of data available for each. I want to create a dataset that is a replicate of x in terms of column names and row names but which contains the data values in y . If the data is not available then an NA would be used as the value for that coordinate.

Not all of the row names in x are present in y and vice versa. The same holds true for the column names.

For the example input data I've given below, the rownames in x corresponding to those in y are the rowname start and end at | (I want to retain everthing after the | for other mappings).

What is the most efficient way to do this?

DESIRED OUTPUT

z = structure(c(NA, 1, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, 
NA, 0, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), .Dim = c(11L, 5L), .Dimnames = list(
c("AACSL|729522", "AACS|65985", "AADACL2|344752", "AADACL3|126767", 
"AADACL4|343066", "AADAC|13", "AADAT|51166", "AAGAB|79719", 
"AAK1|22848", "AAK12|14", "AANAT|15"), c("S18", "S20", "S45", 
"S95", "S100")))

EXAMPLE INPUT

x = structure(c(0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 
1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 
0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0), .Dim = c(11L, 
5L), .Dimnames = list(c("AACSL|729522", "AACS|65985", "AADACL2|344752", 
"AADACL3|126767", "AADACL4|343066", "AADAC|13", "AADAT|51166", 
"AAGAB|79719", "AAK1|22848", "AAK12|14", "AANAT|15"), c("S18", 
"S20", "S45", "S95", "S100")))

y = structure(c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 
0, 0, 0, 0, 0, 0), .Dim = c(11L, 4L), .Dimnames = list(c("A1BG", 
"A1CF", "A2ML1", "A4GALT", "AACS", "AAK1", "AARD", "AARS2", "AASDHPPT", 
"AASS", "BAACS"), c("S18", "S10", "S45", "S95")))

I think there might be a slight problem with the example that you provided, i can not see how the z is coming from the x and y above.. see this code:

intersect(sapply(rownames(x), #I am just extracting the letter codes here
             function(i){
                     return(
                             strsplit(x=i,split="|",fixed=TRUE)[[1]][[1]])
             }),rownames(y))

#[1] "AACS" "AAK1"

weird, right? I mean, there is only 2 codes in y compared to x. However, I think the code below does what you are planning (with the exception of this inconsistency):

library(data.table)
library(reshape2)
library(dplyr)
x %>% as.data.frame %>% mutate(rownames=rownames(x)) %>%
        mutate(nms=sapply(rownames(x),
                          function(i){
                                  return(
                                          strsplit(x=i,split="|",fixed=TRUE)[[1]][[1]])
                          })) %>%
        melt(id.vars=c("nms","rownames")) %>%
        merge(., y %>% as.data.frame %>% mutate(nms=rownames(y))%>% melt(id.vars="nms"), by=c("variable","nms"), all.x=TRUE) %>%
        select(-nms, -value.x) %>% dcast(formula = rownames~variable, value.var="value.y") -> xy
#now put back the column names where they belong
rownames(xy)<-xy$rownames
#now the only thing left is to arrange the columns
xy[rownames(x),colnames(x)] -> xy

Or am I wrong in understanding some of your points?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM