[英]R: Transform unordered long data to wide data
I am looking to transform unordered long data to wide data. 我希望将无序长数据转换为宽数据。
mydata <- data.frame(cat = c('a','a','a','b','c','c','c','c'),
color = c( 1, 1, 1, 2, 1, 1, 1, 1),
hat = c( 1, 1, 2, 2, 1, 2, 1, 2),
shoe = c( 0, 1, 1, 2, 1, 1, 1, 3))
cat
is ID variable with while color is a descriptive statistic that does not change within cat
. cat
是ID变量,而color是描述性统计,在cat
中不会发生变化。
mydata
cat color hat shoe
1 a 1 1 0
2 a 1 1 1
3 a 1 2 1
4 b 2 2 2
5 c 1 1 1
6 c 1 2 1
7 c 1 1 1
8 c 1 2 3
Final Output 最终产出
cat color hat1 shoe1 hat2 shoe2 hat3 shoe3 hat4 shoe4
1 a 1 1 0 1 1 2 1 NA NA
2 b 2 2 2 NA NA NA NA NA NA
3 c 1 1 1 2 1 1 1 2 3
The challenge I seem to be facing is that there is no "time variable". 我似乎面临的挑战是没有“时间变量”。
Add in a counter by cat and then you can use that as your time variable: 通过cat添加一个计数器,然后您可以将其用作时间变量:
library(data.table)
mydata <- data.table(cat = c('a','a','a','b','c','c','c','c'),
color = c( 1, 1, 1, 2, 1, 1, 1, 1),
hat = c( 1, 1, 2, 2, 1, 2, 1, 2),
shoe = c( 0, 1, 1, 2, 1, 1, 1, 3))
mydata[, "dummy.id" := seq(.N), by=cat]
widedata <- reshape(mydata, idvar='cat', timevar='dummy.id', direction='wide')
We can use dcast
from the devel
version of data.table ie v1.9.5+ for this. 我们可以使用
dcast
从devel
这个data.table即v1.9.5的版本+。 We create a sequence variable ('indx') grouped by 'cat', and 'color' column. 我们创建一个由'cat'和'color'列分组的序列变量('indx')。 Then
dcast
from 'long' to 'wide' and specifying the value.var
columns. 然后
dcast
从'long'到'wide'并指定value.var
列。
library(data.table)#v1.9.5+
mydata[, indx:=1:.N, by = .(cat, color)]
dcast(mydata, cat+color~indx, value.var=c('hat', 'shoe'))
# cat color hat_1 hat_2 hat_3 hat_4 shoe_1 shoe_2 shoe_3 shoe_4
#1: a 1 1 1 2 NA 0 1 1 NA
#2: b 2 2 NA NA NA 2 NA NA NA
#3: c 1 1 2 1 2 1 1 1 3
NOTE: Instructions to install the devel version are here
注意:安装devel版本的说明在
here
This can be made compact by combining with getanID
(to create the sequence variable) from splitstackshape
这可以通过用结合变得紧凑
getanID
(创建序列变量)从splitstackshape
library(splitstackshape)
dcast(getanID(mydata, c('cat', 'color')),
cat+color~.id, value.var=c('hat', 'shoe'), sep='')
# cat color hat1 hat2 hat3 hat4 shoe1 shoe2 shoe3 shoe4
#1: a 1 1 1 2 NA 0 1 1 NA
#2: b 2 2 NA NA NA 2 NA NA NA
#3: c 1 1 2 1 2 1 1 1 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.