[英]R using Reshape2 to do what reshape (stats package function) was designed for
I'm trying to do exactly what reshape from the stats package is designed for. 我正在尝试完全根据stats包的目的进行重塑。 I have a wide dataset with a series of variables in the form var_name.date
. 我有一个包含一系列变量的广泛数据集,形式为var_name.date
。 Unfortunately, reshape seems ill-equipped to deal with even medium-sized datasets, so I'm trying to use the the data.table.melt
function. 不幸的是, data.table.melt
处理中等大小的数据集,因此我尝试使用data.table.melt
函数。
My main problem is grouping the variables into separate value columns based on their long-form variable. 我的主要问题是根据变量的长格式变量将变量分组到单独的值列中。 Is this possible, or do I need to do each one separately and then cbind
them? 这是可能的,还是我需要分别做每个然后cbind
它们?
Here is what I have: 这是我所拥有的:
widetable = data.table("id"=1:5,"A.2012-10"=runif(5),"A.2012-11"=runif(5),
"B.2012-10"=runif(5),"B.2012-11"=runif(5))
id A.2012-10 A.2012-11 B.2012-10 B.2012-11
1: 1 0.82982349 0.2257782 0.46390924 0.4448248
2: 2 0.46136746 0.2184797 0.05640388 0.4772663
3: 3 0.61723234 0.3950625 0.03252784 0.4006974
4: 4 0.19963437 0.7028052 0.06811452 0.3096969
5: 5 0.09575389 0.5510507 0.76059610 0.8630222
And here is the the stats
package's reshape
mocking me with one-line awesomeness doing exactly what I want but not scaling. 这是stats
软件包的reshape
以单行真棒模拟了我,完全按照我的要求而不是按比例缩放。
reshape(widetable, idvar="id", varying=colnames(widetable)[2:5],
sep=".", direction="long")
id time A B
1: 1 2012-10 0.82982349 0.46390924
2: 2 2012-10 0.46136746 0.05640388
3: 3 2012-10 0.61723234 0.03252784
4: 4 2012-10 0.19963437 0.06811452
5: 5 2012-10 0.09575389 0.76059610
6: 1 2012-11 0.22577823 0.44482478
7: 2 2012-11 0.21847969 0.47726629
8: 3 2012-11 0.39506249 0.40069737
9: 4 2012-11 0.70280519 0.30969695
10: 5 2012-11 0.55105075 0.86302220
This is just one of those times when reshape()
is more straightforward to use. 这只是reshape()
更易于使用的情况之一。
The most direct approach using a combination of melt
and dcast.data.table
that I can think of is as follows: 我可以想到的结合使用melt
和dcast.data.table
的最直接方法如下:
library(data.table)
library(reshape2)
longtable <- melt(widetable, id.vars = "id")
vars <- do.call(rbind, strsplit(as.character(longtable$variable), ".", TRUE))
dcast.data.table(longtable[, c("V1", "V2") := lapply(1:2, function(x) vars[, x])],
id + V2 ~ V1, value.var = "value")
An alternative is to use merged.stack
from my "splitstackshape" package , specifically the development version. 另一种方法是使用merged.stack
从我的“splitstackshape”包 ,特别是开发版本。
# library(devtools)
# install_github("splitstackshape", "mrdwab", ref = "devel")
library(splitstackshape)
merged.stack(widetable, id.vars = "id", var.stubs = c("A", "B"), sep = "\\.")
# id .time_1 A B
# 1: 1 2012-10 0.26550866 0.2059746
# 2: 1 2012-11 0.89838968 0.4976992
# 3: 2 2012-10 0.37212390 0.1765568
# 4: 2 2012-11 0.94467527 0.7176185
# 5: 3 2012-10 0.57285336 0.6870228
# 6: 3 2012-11 0.66079779 0.9919061
# 7: 4 2012-10 0.90820779 0.3841037
# 8: 4 2012-11 0.62911404 0.3800352
# 9: 5 2012-10 0.20168193 0.7698414
# 10: 5 2012-11 0.06178627 0.7774452
The merged.stack
function works differently from a simple melt
because it starts by "stacking" different groups of columns in a list
and then merging them together. 该merged.stack
功能从简单的工作方式不同melt
,因为它开始的“堆积”不同的列组list
,然后将它们合并在一起。 This allows the function to: 这使函数能够:
This answer is based on the following sample data: 该答案基于以下示例数据:
set.seed(1) # Please use `set.seed()` when sharing an example with random numbers
widetable = data.table("id"=1:5,"A.2012-10"=runif(5),"A.2012-11"=runif(5),
"B.2012-10"=runif(5),"B.2012-11"=runif(5))
See also: What reshaping problems can melt/cast not solve in a single step? 另请参阅: 一步无法解决/解决哪些重塑问题?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.