简体   繁体   English

从长到宽重塑数据,在新的宽变量名称中使用时间

[英]Reshape data from long to wide, with time in new wide variable name

I have a data frame that I would like to merge from long to wide format, but I would like to have the time embedded into the variable name in the wide format. 我有一个数据框,我想从长格式到宽格式合并,但我希望将时间嵌入到宽格式的变量名称中。 Here is an example data set with the long format: 以下是长格式的示例数据集:

id <- as.numeric(rep(1,16))
time <- rep(c(5,10,15,20), 4)
varname <- c(rep("var1",4), rep("var2", 4), rep("var3", 4), rep("var4", 4))
value <- rnorm(16)
tmpdata <- as.data.frame(cbind(id, time, varname, value))

> tmpdata
id time varname              value
1    5    var1  0.713888426169224
1   10    var1   1.71483653545922
1   15    var1  -1.51992072577836
1   20    var1  0.556992407683219
....
4   20    var4   1.03752019932467

I would like this in a wide format with the following output: 我想以宽泛的格式使用以下输出:

id var1.5 var1.10 var1.15 var1.20 ....
1  0.71   1.71    -1.51   0.55 

(and so on)

I've tried using reshape function in base R without success, and I was not sure how to accomplish this using the reshape package, as all of the examples put time as another variable in the wide format. 我尝试在基础R中使用reshape函数但没有成功,我不知道如何使用reshape包实现这一点,因为所有示例都将时间作为宽格式的另一个变量。 Any ideas? 有任何想法吗?

This is trivial with the reshape package: 重塑包装这是微不足道的:

library(reshape)
cast(tmpdata, ... ~ varname + time)

I had to do it in two reshape steps. 我必须在两个reshape步骤中完成它。 The row headings may not be exactly what you needed, but can be renamed easily. 行标题可能不是您所需的,但可以轻松重命名。

id <- as.numeric(rep(1, 16))
time <- rep(c(5,10,15,20), 4)
varname <- c(rep("var1",4), rep("var2", 4), rep("var3", 4), rep("var4", 4))
value <- rnorm(16)
tmpdata <- as.data.frame(cbind(id, time, varname, value))

first <- reshape(tmpdata, timevar="time", idvar=c("id", "varname"), direction="wide")
second <- reshape(first, timevar="varname", idvar="id", direction="wide") 

And the output: 并输出:

> tmpdata
   id time varname               value
1   1    5    var1  -0.231227494628982
2   1   10    var1   -1.80887236653438
3   1   15    var1  -0.443229294431553
4   1   20    var1    1.33719337048763
5   1    5    var2   0.673109282347586
6   1   10    var2   -0.42142267953938
7   1   15    var2   0.874367622725874
8   1   20    var2   -1.19917678039462
9   1    5    var3    1.13495606258399
10  1   10    var3 -0.0779385346672042
11  1   15    var3  -0.126775240288037
12  1   20    var3  -0.760739300144526
13  1    5    var4   -1.94626587907069
14  1   10    var4    1.25643195699455
15  1   15    var4   -0.50986941213717
16  1   20    var4   -1.01324846239812
> first
   id varname            value.5            value.10           value.15
1   1    var1 -0.231227494628982   -1.80887236653438 -0.443229294431553
5   1    var2  0.673109282347586   -0.42142267953938  0.874367622725874
9   1    var3   1.13495606258399 -0.0779385346672042 -0.126775240288037
13  1    var4  -1.94626587907069    1.25643195699455  -0.50986941213717
             value.20
1    1.33719337048763
5   -1.19917678039462
9  -0.760739300144526
13  -1.01324846239812
> second
  id       value.5.var1     value.10.var1      value.15.var1    value.20.var1
1  1 -0.231227494628982 -1.80887236653438 -0.443229294431553 1.33719337048763
       value.5.var2     value.10.var2     value.15.var2     value.20.var2
1 0.673109282347586 -0.42142267953938 0.874367622725874 -1.19917678039462
      value.5.var3       value.10.var3      value.15.var3      value.20.var3
1 1.13495606258399 -0.0779385346672042 -0.126775240288037 -0.760739300144526
       value.5.var4    value.10.var4     value.15.var4     value.20.var4
1 -1.94626587907069 1.25643195699455 -0.50986941213717 -1.01324846239812

I gave up on the old reshape() command 2 years ago (not Hadley's). 两年前我放弃了旧的reshape()命令(不是哈德利)。 It seems figuring that damn thing out each time was actually harder than just doing it the 'hard' way, which is much more flexible. 看起来每次该死的东西实际上比仅仅采用“硬”方式更难,这更加灵活。

Your data in your example are all nicely sorted. 您的示例中的数据都已很好地排序。 You might have to sort your real data by var name and time first. 您可能必须先按var名称和时间对实际数据进行排序。

(renamed your tmpdata to tmp, made value numeric) (将您的tmpdata重命名为tmp,使值为数字)

y <- lapply(split(tmp, tmp$id), function(x) x$value)
df <- data.frame(unique(tmp$id,), do.call(rbind,y))
names(df) <- c('id', as.character(tmp$time:tmp$var))

为什么不在重塑之前将varname和time粘贴在一起?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM