R使用Reshape2进行重塑（统计数据包功能）设计用于

Question

I'm trying to do exactly what reshape from the stats package is designed for. 我正在尝试完全根据stats包的目的进行重塑。 I have a wide dataset with a series of variables in the form var_name.date . 我有一个包含一系列变量的广泛数据集，形式为var_name.date 。 Unfortunately, reshape seems ill-equipped to deal with even medium-sized datasets, so I'm trying to use the the data.table.melt function. 不幸的是， data.table.melt处理中等大小的数据集，因此我尝试使用data.table.melt函数。

My main problem is grouping the variables into separate value columns based on their long-form variable. 我的主要问题是根据变量的长格式变量将变量分组到单独的值列中。 Is this possible, or do I need to do each one separately and then cbind them? 这是可能的，还是我需要分别做每个然后cbind它们？

Here is what I have: 这是我所拥有的：

widetable = data.table("id"=1:5,"A.2012-10"=runif(5),"A.2012-11"=runif(5),
                       "B.2012-10"=runif(5),"B.2012-11"=runif(5))


   id  A.2012-10 A.2012-11  B.2012-10 B.2012-11
1:  1 0.82982349 0.2257782 0.46390924 0.4448248
2:  2 0.46136746 0.2184797 0.05640388 0.4772663
3:  3 0.61723234 0.3950625 0.03252784 0.4006974
4:  4 0.19963437 0.7028052 0.06811452 0.3096969
5:  5 0.09575389 0.5510507 0.76059610 0.8630222

And here is the the stats package's reshape mocking me with one-line awesomeness doing exactly what I want but not scaling. 这是stats软件包的reshape以单行真棒模拟了我，完全按照我的要求而不是按比例缩放。

reshape(widetable, idvar="id", varying=colnames(widetable)[2:5],
        sep=".", direction="long")


    id  time          A          B
 1:  1 2012-10 0.82982349 0.46390924
 2:  2 2012-10 0.46136746 0.05640388
 3:  3 2012-10 0.61723234 0.03252784
 4:  4 2012-10 0.19963437 0.06811452
 5:  5 2012-10 0.09575389 0.76059610
 6:  1 2012-11 0.22577823 0.44482478
 7:  2 2012-11 0.21847969 0.47726629
 8:  3 2012-11 0.39506249 0.40069737
 9:  4 2012-11 0.70280519 0.30969695
10:  5 2012-11 0.55105075 0.86302220

Answer 1

This is just one of those times when reshape() is more straightforward to use. 这只是reshape()更易于使用的情况之一。

The most direct approach using a combination of melt and dcast.data.table that I can think of is as follows: 我可以想到的结合使用melt和dcast.data.table的最直接方法如下：

library(data.table)
library(reshape2)

longtable <- melt(widetable, id.vars = "id")
vars <- do.call(rbind, strsplit(as.character(longtable$variable), ".", TRUE))
dcast.data.table(longtable[, c("V1", "V2") := lapply(1:2, function(x) vars[, x])],
                 id + V2 ~ V1, value.var = "value")

An alternative is to use merged.stack from my "splitstackshape" package , specifically the development version. 另一种方法是使用merged.stack从我的“splitstackshape”包，特别是开发版本。

# library(devtools)
# install_github("splitstackshape", "mrdwab", ref = "devel")
library(splitstackshape)

merged.stack(widetable, id.vars = "id", var.stubs = c("A", "B"), sep = "\\.")
#     id .time_1          A         B
#  1:  1 2012-10 0.26550866 0.2059746
#  2:  1 2012-11 0.89838968 0.4976992
#  3:  2 2012-10 0.37212390 0.1765568
#  4:  2 2012-11 0.94467527 0.7176185
#  5:  3 2012-10 0.57285336 0.6870228
#  6:  3 2012-11 0.66079779 0.9919061
#  7:  4 2012-10 0.90820779 0.3841037
#  8:  4 2012-11 0.62911404 0.3800352
#  9:  5 2012-10 0.20168193 0.7698414
# 10:  5 2012-11 0.06178627 0.7774452

The merged.stack function works differently from a simple melt because it starts by "stacking" different groups of columns in a list and then merging them together. 该merged.stack功能从简单的工作方式不同melt ，因为它开始的“堆积”不同的列组list ，然后将它们合并在一起。 This allows the function to: 这使函数能够：

Work with column groups where each column group might be of a different type (character, numeric, and so on). 处理每个列组可能具有不同类型（字符，数字等）的列组。
Work with "unbalanced" column groups (where one group might have two measure columns and another might have three). 使用“不平衡”列组（一组可能有两个度量值列，而另一组可能有三个度量值列）。

This answer is based on the following sample data: 该答案基于以下示例数据：

set.seed(1) # Please use `set.seed()` when sharing an example with random numbers
widetable = data.table("id"=1:5,"A.2012-10"=runif(5),"A.2012-11"=runif(5),
                       "B.2012-10"=runif(5),"B.2012-11"=runif(5))

See also: What reshaping problems can melt/cast not solve in a single step? 另请参阅：一步无法解决/解决哪些重塑问题？

R使用Reshape2进行重塑（统计数据包功能）设计用于

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-07-12 06:47:27

R使用Reshape2进行重塑（统计数据包功能）设计用于

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-07-12 06:47:27

解决方案1
1 已采纳 2014-07-12 06:47:27