简体   繁体   中英

Is there an R package with a generalized class of data.frame in which a column can be an array (or how do I define such a class)?

I have been wondering about this for a long time. The data.frame class in base R only allow the columns to be vectors. I was looking for a package which generalize this so that each "column" can be a 2-d or even nd array with similar methods to the original class data.frame such as sub-setting with "[]", merge, aggregate, etc.

My reason for such a class is to deal with Monte Carlo simulation data. For example, for each simulation the result can be expressed as a data frame in which the row indices are dates, and columns include character and numeric. If I simulate 1000 times then I get 1000 such data frames. If there is a class in R with which I can store the results in one object and has the convenience of most of the data.frame methods, it'll make my coding a lot easier.

As I couldn't find such a package I attempted to create my own with no success. I came across this package "S4Vectors" with a "DataFrame" class, which "supports the storage of any type of object (with length and [ methods) as columns." Here is my attempt. library(S4Vectors) test <- matrix(1:6,2,3) test1 <- matrix(7:12,2,3) setClass("Column", slots=list(), contains = "matrix") setMethod("length", "Column", function(x) {nrow(x)}) '[.Column' <- function(x, i, j, ...) { i <- ((i-1)*ncol(x)+1):(i*(ncol(x))) NextMethod() } testColumn <- new("Column", test) testColumn1 <- new("Column", test1) length(testColumn) testColumn[1] testDataFrame <- DataFrame(Col1 = testColumn, Col2 = testColumn1) I did get the length and [ method to work but the last statement gives an error "cannot coerce class "Column" to a DataFrame".

Has anyone ever tried to do something similar?

Update: Thanks to G. Grothendieck I now know a data frame can take a matrix as a column by using the I() function. Now I am wondering if there is way to preserve such a structure in all operations. An example would be to aggregate the data frame

data.frame(v = c(1,1,2,2), m = I(diag(4)))

by v so that the result is

data.frame(v = c(1,2), m = I(matrix(c(1,1,0,0,0,0,1,1), 2, 4, byrow = T))) .

data frames do allow matrix columns:

m <- diag(4)
v <- 1:4
DF <- data.frame(v, m = I(m))
str(DF)

giving:

'data.frame':   4 obs. of  2 variables:
 $ v: int  1 2 3 4
 $ m: 'AsIs' num [1:4, 1:4] 1 0 0 0 0 1 0 0 0 0 ...

Update 1

The R aggregate function can create matrix columns. For example,

DF <- data.frame(v = 1:4, g = c(1, 1, 2, 2))
ag <- aggregate(v ~ g, DF, function(x) c(sum = sum(x), mean = mean(x)))
str(ag)

giving:

'data.frame':   2 obs. of  2 variables:
 $ g: num  1 2
 $ v: num [1:2, 1:2] 3 7 1.5 3.5
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr  "sum" "mean"

Update 2

I don't think the aggregation discussed in the comments is nicely supported in R but you can use the following workaround:

m <- matrix(1:16, 4)
v <- c(1, 1, 2, 2)
DF <- data.frame(v, m = I(m))

nr <- nrow(DF)
ag2 <- aggregate(list(sum = 1:nr), DF["v"], function(ix) colSums(DF$m[ix, ]))
str(ag2)

giving:

'data.frame':   2 obs. of  2 variables:
 $ v  : num  1 2
 $ sum: num [1:2, 1:4] 3 7 11 15 19 23 27 31

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM