简体   繁体   中英

R using Reshape2 to do what reshape (stats package function) was designed for

I'm trying to do exactly what reshape from the stats package is designed for. I have a wide dataset with a series of variables in the form var_name.date . Unfortunately, reshape seems ill-equipped to deal with even medium-sized datasets, so I'm trying to use the the data.table.melt function.

My main problem is grouping the variables into separate value columns based on their long-form variable. Is this possible, or do I need to do each one separately and then cbind them?

Here is what I have:

widetable = data.table("id"=1:5,"A.2012-10"=runif(5),"A.2012-11"=runif(5),
                       "B.2012-10"=runif(5),"B.2012-11"=runif(5))


   id  A.2012-10 A.2012-11  B.2012-10 B.2012-11
1:  1 0.82982349 0.2257782 0.46390924 0.4448248
2:  2 0.46136746 0.2184797 0.05640388 0.4772663
3:  3 0.61723234 0.3950625 0.03252784 0.4006974
4:  4 0.19963437 0.7028052 0.06811452 0.3096969
5:  5 0.09575389 0.5510507 0.76059610 0.8630222

And here is the the stats package's reshape mocking me with one-line awesomeness doing exactly what I want but not scaling.

reshape(widetable, idvar="id", varying=colnames(widetable)[2:5],
        sep=".", direction="long")


    id  time          A          B
 1:  1 2012-10 0.82982349 0.46390924
 2:  2 2012-10 0.46136746 0.05640388
 3:  3 2012-10 0.61723234 0.03252784
 4:  4 2012-10 0.19963437 0.06811452
 5:  5 2012-10 0.09575389 0.76059610
 6:  1 2012-11 0.22577823 0.44482478
 7:  2 2012-11 0.21847969 0.47726629
 8:  3 2012-11 0.39506249 0.40069737
 9:  4 2012-11 0.70280519 0.30969695
10:  5 2012-11 0.55105075 0.86302220

This is just one of those times when reshape() is more straightforward to use.

The most direct approach using a combination of melt and dcast.data.table that I can think of is as follows:

library(data.table)
library(reshape2)

longtable <- melt(widetable, id.vars = "id")
vars <- do.call(rbind, strsplit(as.character(longtable$variable), ".", TRUE))
dcast.data.table(longtable[, c("V1", "V2") := lapply(1:2, function(x) vars[, x])],
                 id + V2 ~ V1, value.var = "value")

An alternative is to use merged.stack from my "splitstackshape" package , specifically the development version.

# library(devtools)
# install_github("splitstackshape", "mrdwab", ref = "devel")
library(splitstackshape)

merged.stack(widetable, id.vars = "id", var.stubs = c("A", "B"), sep = "\\.")
#     id .time_1          A         B
#  1:  1 2012-10 0.26550866 0.2059746
#  2:  1 2012-11 0.89838968 0.4976992
#  3:  2 2012-10 0.37212390 0.1765568
#  4:  2 2012-11 0.94467527 0.7176185
#  5:  3 2012-10 0.57285336 0.6870228
#  6:  3 2012-11 0.66079779 0.9919061
#  7:  4 2012-10 0.90820779 0.3841037
#  8:  4 2012-11 0.62911404 0.3800352
#  9:  5 2012-10 0.20168193 0.7698414
# 10:  5 2012-11 0.06178627 0.7774452

The merged.stack function works differently from a simple melt because it starts by "stacking" different groups of columns in a list and then merging them together. This allows the function to:

  1. Work with column groups where each column group might be of a different type (character, numeric, and so on).
  2. Work with "unbalanced" column groups (where one group might have two measure columns and another might have three).

This answer is based on the following sample data:

set.seed(1) # Please use `set.seed()` when sharing an example with random numbers
widetable = data.table("id"=1:5,"A.2012-10"=runif(5),"A.2012-11"=runif(5),
                       "B.2012-10"=runif(5),"B.2012-11"=runif(5))

See also: What reshaping problems can melt/cast not solve in a single step?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM