Reshape long structured data.table into a wide structure using data.table functionality?

Question

> library(data.table)
> A <- data.table(x = c(1,1,2,2), y = c(1,2,1,2), v = c(0.1,0.2,0.3,0.4))
> A
   x y   v
1: 1 1 0.1
2: 1 2 0.2
3: 2 1 0.3
4: 2 2 0.4
> B <- dcast(A, x~y)
Using v as value column: use value.var to override.
> B
  x   1   2
1 1 0.1 0.2
2 2 0.3 0.4

Apparently I can reshape a data.table from long to wide using fx dcast of package reshape2. But data.table comes along with an overloaded bracket-operator offering parameters like 'by' and 'group', which make me wonder if it is possible to achieve it using this (to data.table specific functionality)?

Just one random example from the manual:

DT[,lapply(.SD,sum),by=x]

That looks awesome - but I don't fully understand the usage yet.

I neither found a way nor an example for this so maybe it is just not possible maybe it isn't even supposed to be - so, a definite "no, is not possible because ..." is then of course also a valid answer.

Answer 1

I'll pick an example with unequal groups so that it's easier to illustrate for the general case:

A <- data.table(x=c(1,1,1,2,2), y=c(1,2,3,1,2), v=(1:5)/5)
> A
   x y   v
1: 1 1 0.2
2: 1 2 0.4
3: 1 3 0.6
4: 2 1 0.8
5: 2 2 1.0

The first step is to get the number of elements/entries for each group of "x" to be the same. Here, for x=1 there are 3 values of y, but only 2 for x=2. So, we'll have to fix that first with NA for x=2, y=3.

setkey(A, x, y)
A[CJ(unique(x), unique(y))]

Now, to get it to wide format, we should group by "x" and use as.list on v as follows:

out <- A[CJ(unique(x), unique(y))][, as.list(v), by=x]
   x  V1  V2  V3
1: 1 0.2 0.4 0.6
2: 2 0.8 1.0  NA

Now, you can set the names of the reshaped columns using reference with setnames as follows:

setnames(out, c("x", as.character(unique(A$y)))

   x   1   2   3
1: 1 0.2 0.4 0.6
2: 2 0.8 1.0  NA

Answer 2

Use dcast() (now a default data.table method, from version 1.9.5; earlier versions use dcast.data.table ) as in

> dcast(A,x~y)
Using 'v' as value column. Use 'value.var' to override
   x   1   2   3
1: 1 0.2 0.4 0.6
2: 2 0.8 1.0  NA

This is fast and obviates the need to setnames() .

It is also especially helpful when y in the above example is a factor variable with character levels -- eg 'Low', 'Medium', 'High' -- because CJ() may not return the wide data with variables in the order that setnames() expects, and you can end up with your data mislabeled badly.

Answer 3

（与Arun的学分）

A[, setattr(as.list(v), 'names', y), by=x]

Reshape long structured data.table into a wide structure using data.table functionality?

Question

3 answers

solution1
16 ACCPTED 2013-08-04 21:50:27

solution2
10 2014-04-25 22:28:24

solution3
2 2013-08-04 21:46:09

Reshape long structured data.table into a wide structure using data.table functionality?

Question

3 answers

solution1 16 ACCPTED 2013-08-04 21:50:27

solution2 10 2014-04-25 22:28:24

solution3 2 2013-08-04 21:46:09

solution1
16 ACCPTED 2013-08-04 21:50:27

solution2
10 2014-04-25 22:28:24

solution3
2 2013-08-04 21:46:09