[英]data.frame with a column containing a matrix in R
I'm trying to put some matrices in a dataframe in R, something like:我正在尝试将一些矩阵放入 R 中的 dataframe 中,例如:
m <- matrix(c(1,2,3,4), nrow=2, ncol=2)
df <- data.frame(id=1, mat=m)
But when I do that, I get a dataframe with 2 rows and 3 columns instead of a dataframe with 1 row and 2 columns.但是,当我这样做时,我得到了一个具有 2 行和 3 列的 dataframe,而不是具有 1 行和 2 列的 dataframe。
Reading the documentation, I have to escape my matrix using I().阅读文档,我必须使用 I() 转义我的矩阵。
df <- data.frame(id=1, mat=I(m))
str(df)
'data.frame': 2 obs. of 2 variables:
$ id : num 1 1
$ mat: AsIs [1:2, 1:2] 1 2 3 4
As I understand it, the dataframe contains one row for each row of the matrix, and the mat field is a list of matrix column values.据我了解,dataframe 矩阵的每一行都包含一行,而 mat 字段是矩阵列值的列表。
Thus, how can I obtain a dataframe containing matrices?因此,如何获得包含矩阵的 dataframe?
Thanks !谢谢 !
I find data.frames containing matrices mind-bendingly weird, but: the only way I know to achieve this is hidden in stats:::simulate.lm
我发现包含矩阵的 data.frames 令人费解,但是:我知道实现这一点的唯一方法是隐藏在stats:::simulate.lm
Try this, poke through and see what's happening:试试这个,戳穿,看看发生了什么:
d <- data.frame(y=1:5,n=5)
g0 <- glm(cbind(y,n-y)~1,data=d,family=binomial)
debug(stats:::simulate.lm)
s <- simulate(g0,n=5)
This is the weird, back-door solution.这是奇怪的后门解决方案。 Create a list, change its class to data.frame
, and then (this is required ) set the names
and row.names
manually (if you don't do those final steps the data will still be in the object, but it will print out as though it had zero rows...)创建一个列表,将其 class 更改为data.frame
,然后(这是必需的)手动设置names
和row.names
(如果您不执行这些最后步骤,数据仍将在 object 中,但它会打印好像它有零行......)
m1 <- matrix(1:10,ncol=2)
m2 <- matrix(5:14,ncol=2)
dd <- list(m1,m2)
class(dd) <- "data.frame"
names(dd) <- LETTERS[1:2]
row.names(dd) <- 1:5
dd
A much easier way to do this is to define the data frame with a placeholder for the matrix一个更简单的方法是使用矩阵的占位符定义数据框
m <- matrix(c(1, 2, 3, 4), nrow = 2, ncol = 2)
df <- data.frame(id = 1, mat = rep(0, nrow(m)))
Then to assign the matrix.然后分配矩阵。 No need to play with the class of a list or to use an *apply()
function.无需使用列表的 class 或使用*apply()
function。
df$mat <- m
I came across the same problem trying to understand the gasoline data in pls package.我在尝试了解pls package 中的汽油数据时遇到了同样的问题。 Used $
for the job.用$
来做这项工作。 First, lets create a matrix, lets call it spectra_mat, then a vector called response_var1.首先,让我们创建一个矩阵,我们称之为spectra_mat,然后是一个名为response_var1 的向量。
spectra_mat = matrix(1:45, 9, 5)
response_var1 = seq(1:9)
Now we put the vector response_var1 in a new data frame - lets call it df.现在我们将向量 response_var1 放入一个新的数据框中——我们称之为 df。
df = data.frame(response_var1)
df$spectra = spectra_mat
To check,去检查,
str(df)
'data.frame': 9 obs. of 2 variables:
$ response_var1: int 1 2 3 4 5 6 7 8 9
$ spectra : int [1:9, 1:5] 1 2 3 4 5 6 7 8 9 10 ...
Data frames containing matrix columns do have their uses in specialized scenarios.包含矩阵列的数据框在特定场景中确实有其用途。 These scenarios are cases when you have a whole vector of some variable for every observation in your data set.这些情况是当您对数据集中的每个观察值都有某个变量的整个向量时。 There are two cases that I have come across where this is common:我遇到过两种常见的情况:
If you're working with data frames, there are a few obvious ways to handle this data that are both inefficient.如果您正在使用数据框,则有一些明显的方法可以处理这些数据,但这些方法都是低效的。 I'll use the Bayesian case as an example:我将以贝叶斯案例为例:
Data frames with matrix columns are a very useful solution to this situation.带有矩阵列的数据框是解决这种情况的一个非常有用的解决方案。 The posterior stays in a matrix that has the same number of rows as the data frame.后验保留在与数据框具有相同行数的矩阵中。 But that matrix only is recognized as a single "column" in the data frame, and referring to that column using df$mat will return the matrix.但是该矩阵仅被识别为数据框中的单个“列”,并且使用 df$mat 引用该列将返回矩阵。 You can even use some dplyr functions like filtering to return the corresponding rows of the matrix, but this is a bit experimental .您甚至可以使用一些 dplyr 函数(如过滤)来返回矩阵的相应行,但这有点实验性。
The easiest method to create the matrix column is in two steps.创建矩阵列的最简单方法是分两个步骤。 First create the data frame without the matrix column, then add the matrix column with a simple assignment.首先创建没有矩阵列的数据框,然后通过简单的赋值添加矩阵列。 I haven't found a 1-step solution to do this that doesn't involve I()
which changes the column type.我还没有找到不涉及更改列类型的I()
的 1 步解决方案。
m <- matrix(c(1,2,3,4), nrow=2, ncol=2)
df <- data.frame(id = rep(1, nrow(m)))
df$mat <- m
names(df)
# [1] "id" "mat"
str(df)
# 'data.frame': 2 obs. of 2 variables:
# $ id : num 1 1
# $ mat: num [1:2, 1:2] 1 2 3 4
The result you got (2 rows x 3 columns) is what is to be expected from R, as it amounts to cbind
a vector ( id
, with recycling) and a matrix ( m
).您得到的结果(2 行 x 3 列)是 R 的预期结果,因为它相当于cbind
一个向量( id
,带回收)和一个矩阵( m
)。
IMO, it would be better to use list
or array
(when dimensions agree, no mix of numeric and factors values allowed), if you really want to bind different data structures. IMO,如果您真的想绑定不同的数据结构,最好使用list
或array
(当尺寸一致时,不允许数字和因子值混合)。 Otherwise, just cbind
your matrix to an existing data.frame if both have the same number of rows will do the job.否则,只需将您的矩阵cbind
到现有的 data.frame 如果两者具有相同的行数就可以完成这项工作。 For example例如
x1 <- replicate(2, rnorm(10))
x2 <- replicate(2, rnorm(10))
x12l <- list(x1=x1, x2=x2)
x12a <- array(rbind(x1, x2), dim=c(10,2,2))
and the results reads结果显示
> str(x12l)
List of 2
$ x1: num [1:10, 1:2] -0.326 0.552 -0.675 0.214 0.311 ...
$ x2: num [1:10, 1:2] -0.164 0.709 -0.268 -1.464 0.744 ...
> str(x12a)
num [1:10, 1:2, 1:2] -0.326 0.552 -0.675 0.214 0.311 ...
Lists are easier to use if you plan to use matrix of varying dimensions, and providing they are organized in the same way (for rows) as an external data.frame you can subset them as easily.如果您计划使用不同维度的矩阵,列表更易于使用,并且如果它们以与外部 data.frame 相同的方式(对于行)进行组织,您可以轻松地对它们进行子集化。 Here is an example:这是一个例子:
df1 <- data.frame(grp=gl(2, 5, labels=LETTERS[1:2]),
age=sample(seq(25,35), 10, rep=T))
with(df1, tapply(x12l$x1[,1], list(grp, age), mean))
You can also use lapply
(for list) and apply
(for array) functions.您还可以使用lapply
(用于列表)和apply
(用于数组)函数。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.