I have been wondering about this for a long time. The data.frame class in base R only allow the columns to be vectors. I was looking for a package which generalize this so that each "column" can be a 2-d or even nd array with similar methods to the original class data.frame such as sub-setting with "[]", merge, aggregate, etc.
My reason for such a class is to deal with Monte Carlo simulation data. For example, for each simulation the result can be expressed as a data frame in which the row indices are dates, and columns include character and numeric. If I simulate 1000 times then I get 1000 such data frames. If there is a class in R with which I can store the results in one object and has the convenience of most of the data.frame methods, it'll make my coding a lot easier.
As I couldn't find such a package I attempted to create my own with no success. I came across this package "S4Vectors" with a "DataFrame" class, which "supports the storage of any type of object (with length and [ methods) as columns." Here is my attempt. library(S4Vectors) test <- matrix(1:6,2,3) test1 <- matrix(7:12,2,3) setClass("Column", slots=list(), contains = "matrix") setMethod("length", "Column", function(x) {nrow(x)}) '[.Column' <- function(x, i, j, ...) { i <- ((i-1)*ncol(x)+1):(i*(ncol(x))) NextMethod() } testColumn <- new("Column", test) testColumn1 <- new("Column", test1) length(testColumn) testColumn[1] testDataFrame <- DataFrame(Col1 = testColumn, Col2 = testColumn1)
I did get the length and [ method to work but the last statement gives an error "cannot coerce class "Column" to a DataFrame".
Has anyone ever tried to do something similar?
Update: Thanks to G. Grothendieck I now know a data frame can take a matrix as a column by using the I() function. Now I am wondering if there is way to preserve such a structure in all operations. An example would be to aggregate the data frame
data.frame(v = c(1,1,2,2), m = I(diag(4)))
by v so that the result is
data.frame(v = c(1,2), m = I(matrix(c(1,1,0,0,0,0,1,1), 2, 4, byrow = T)))
.
data frames do allow matrix columns:
m <- diag(4)
v <- 1:4
DF <- data.frame(v, m = I(m))
str(DF)
giving:
'data.frame': 4 obs. of 2 variables:
$ v: int 1 2 3 4
$ m: 'AsIs' num [1:4, 1:4] 1 0 0 0 0 1 0 0 0 0 ...
The R aggregate
function can create matrix columns. For example,
DF <- data.frame(v = 1:4, g = c(1, 1, 2, 2))
ag <- aggregate(v ~ g, DF, function(x) c(sum = sum(x), mean = mean(x)))
str(ag)
giving:
'data.frame': 2 obs. of 2 variables:
$ g: num 1 2
$ v: num [1:2, 1:2] 3 7 1.5 3.5
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "sum" "mean"
I don't think the aggregation discussed in the comments is nicely supported in R but you can use the following workaround:
m <- matrix(1:16, 4)
v <- c(1, 1, 2, 2)
DF <- data.frame(v, m = I(m))
nr <- nrow(DF)
ag2 <- aggregate(list(sum = 1:nr), DF["v"], function(ix) colSums(DF$m[ix, ]))
str(ag2)
giving:
'data.frame': 2 obs. of 2 variables:
$ v : num 1 2
$ sum: num [1:2, 1:4] 3 7 11 15 19 23 27 31
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.