简体   繁体   English

矩阵列表到数据框列表

[英]List of list of matrix to dataframe

I have a list containing lists of matrices, as below: 我有一个包含矩阵列表的列表,如下所示:

set.seed(123)

mat1 <- matrix(rnorm(9,1,2), ncol=3, nrow=3)
mat2 <- matrix(rnorm(9,1,3), ncol=3, nrow=3)

mynames <- c("a","b","c")

colnames(mat1) <- mynames
colnames(mat2) <- mynames

rownames(mat1) <- mynames
rownames(mat2) <- mynames

finallist <- list(val1 = list(subval1 = mat1), val2 = list(subval1 = mat2))

I was looking to get an output as: 我一直在寻找输出为:

goal <- data.frame(val1 = rnorm(9,1,2), val2 = rnorm(9,1,3), subval = rep("subval1",9), origrownames = rep(mynames, 3), origcolumnnames = rep(mynames,each=3))

I know there might be an intermediary dataframe which I can use reshape on, but I can't seem to get anything close. 我知道可能存在一个中间数据框,可以在其上使用重塑形状,但是似乎无法关闭任何东西。 I have tried do.call("rbind", finallist) , but this does not seem to preserve the names of the top level list and the child list. 我已经尝试过do.call("rbind", finallist) ,但这似乎并没有保留顶级列表和子列表的名称。 Additionally, the sublists contain 2000 matrices each, with each matrix 20x20 in dimension, and I plan on using this function 20+ times, so I'm looking for something that isn't too slow. 此外,子列表每个包含2000个矩阵,每个矩阵的尺寸为20x20,我计划使用此函数20次以上,因此我正在寻找不太慢的东西。

The particular structure of this data can be accessed by a fairly rare method called recursive indexing. 可以通过一种称为递归索引的相当罕见的方法来访问此数据的特定结构。 Here is three lines that will produce the result. 这三行将产生结果。

# build row and column names variables
mydf <- data.frame(origrownames = rep(mynames, 3), origcolumnnames = rep(mynames, each=3))
# use matrix subsetting to extract val1 and val2 variables
mydf[c("val1", "val2")] <- list(finallist[[c(1,1)]][as.matrix(mydf)],
                                finallist[[2:1]][as.matrix(mydf)])
# extract subval1 from list
mydf$subval <- names(finallist$val1)

The point of interest here is the second line, which first uses recursive indexing (the [[c(1, 1)]] and [[2:1]] ) to pull out elements in the nested lists and then uses matrix subsetting on the row and column names of the matrix to pull out the values in the desired order (see ?"[" for details on both of these methods). 这里的关注点是第二行,它首先使用递归索引( [[c(1, 1)]][[2:1]] )来拉出嵌套列表中的元素,然后在矩阵的行和列名称,以所需顺序提取值(有关这两种方法的详细信息,请参见?"[" )。

The output from these extractions are wrapped in a list and then fed to mydf[c("va1", "val2")] which adds them to the data.frame with the desired names. 这些提取的输出包装在一个列表中,然后馈送到mydf[c("va1", "val2")] ,该函数将它们添加到具有所需名称的data.frame中。

This returns 这返回

mydf
  origrownames origcolumnnames       val1       val2  subval
1            a               a -0.1209513 -0.3369859 subval1
2            b               a  0.5396450  4.6722454 subval1
3            c               a  4.1174166  2.0794415 subval1
4            a               b  1.1410168  2.2023144 subval1
5            b               b  1.2585755  1.3320481 subval1
6            c               b  4.4301300 -0.6675234 subval1
7            a               c  1.9218324  6.3607394 subval1
8            b               c -1.5301225  2.4935514 subval1
9            c               c -0.3737057 -4.8998515 subval1

You can reorder the columns using 您可以使用

mydf <- mydf[c("val1", "val2", "subval", "origrownames", "origcolumnnames")]

You could do 你可以做

tmp <- simplify2array(unlist(finallist, FALSE))
setNames(cbind(expand.grid(dimnames(tmp)[-3]), apply(tmp, 3, c), 'subval1'),
         c('origrownames', 'origcolumnames', names(finallist), 'subval'))
#  origrownames origcolumnames       val1       val2  subval
#1            a              a -0.1209513 -0.3369859 subval1
#2            b              a  0.5396450  4.6722454 subval1
#3            c              a  4.1174166  2.0794415 subval1
#4            a              b  1.1410168  2.2023144 subval1
#5            b              b  1.2585755  1.3320481 subval1
#6            c              b  4.4301300 -0.6675234 subval1
#7            a              c  1.9218324  6.3607394 subval1
#8            b              c -1.5301225  2.4935514 subval1
#9            c              c -0.3737057 -4.8998515 subval1

Although the 'subval' variable seems redundant (it can only take one value). 尽管'subval'变量似乎是多余的(只能取一个值)。 In my opinion this makes more sense 我认为这更有意义

setNames(as.data.frame.table(simplify2array(lapply(finallist, '[[', 1))),
         c('origrownames', 'origcolumnames', 'variable', 'value'))
#   origrownames origcolumnames variable      value
#1             a              a     val1 -0.1209513
#2             b              a     val1  0.5396450
#3             c              a     val1  4.1174166
#4             a              b     val1  1.1410168
#5             b              b     val1  1.2585755
#6             c              b     val1  4.4301300
#7             a              c     val1  1.9218324
#8             b              c     val1 -1.5301225
#9             c              c     val1 -0.3737057
#10            a              a     val2 -0.3369859
#11            b              a     val2  4.6722454
#12            c              a     val2  2.0794415
#13            a              b     val2  2.2023144
#14            b              b     val2  1.3320481
#15            c              b     val2 -0.6675234
#16            a              c     val2  6.3607394
#17            b              c     val2  2.4935514
#18            c              c     val2 -4.8998515

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM