简体   繁体   English

R:将无序长数据转换为宽数据

[英]R: Transform unordered long data to wide data

I am looking to transform unordered long data to wide data. 我希望将无序长数据转换为宽数据。

mydata <- data.frame(cat   = c('a','a','a','b','c','c','c','c'),
                     color = c(  1,  1,  1,  2,  1,  1,  1,  1),
                     hat   = c(  1,  1,  2,  2,  1,  2,  1,  2),
                     shoe  = c(  0,  1,  1,  2,  1,  1,  1,  3))

cat is ID variable with while color is a descriptive statistic that does not change within cat . cat是ID变量,而color是描述性统计,在cat中不会发生变化。

mydata
    cat color hat shoe
1     a     1   1    0
2     a     1   1    1
3     a     1   2    1
4     b     2   2    2
5     c     1   1    1
6     c     1   2    1
7     c     1   1    1
8     c     1   2    3

Final Output 最终产出

  cat color hat1 shoe1 hat2 shoe2 hat3 shoe3 hat4 shoe4
1   a     1    1     0    1     1    2     1   NA    NA
2   b     2    2     2   NA    NA   NA    NA   NA    NA
3   c     1    1     1    2     1    1     1    2     3

The challenge I seem to be facing is that there is no "time variable". 我似乎面临的挑战是没有“时间变量”。

Add in a counter by cat and then you can use that as your time variable: 通过cat添加一个计数器,然后您可以将其用作时间变量:

library(data.table)
mydata <- data.table(cat   = c('a','a','a','b','c','c','c','c'),
                 color = c(  1,  1,  1,  2,  1,  1,  1,  1),
                 hat   = c(  1,  1,  2,  2,  1,  2,  1,  2),
                 shoe  = c(  0,  1,  1,  2,  1,  1,  1,  3))

mydata[, "dummy.id" := seq(.N), by=cat]
widedata <- reshape(mydata, idvar='cat', timevar='dummy.id', direction='wide')

We can use dcast from the devel version of data.table ie v1.9.5+ for this. 我们可以使用dcastdevel这个data.table即v1.9.5的版本+。 We create a sequence variable ('indx') grouped by 'cat', and 'color' column. 我们创建一个由'cat'和'color'列分组的序列变量('indx')。 Then dcast from 'long' to 'wide' and specifying the value.var columns. 然后dcast从'long'到'wide'并指定value.var列。

 library(data.table)#v1.9.5+
 mydata[, indx:=1:.N, by = .(cat, color)]
 dcast(mydata, cat+color~indx, value.var=c('hat', 'shoe'))
 #     cat color hat_1 hat_2 hat_3 hat_4 shoe_1 shoe_2 shoe_3 shoe_4
 #1:   a     1     1     1     2    NA      0      1      1     NA
 #2:   b     2     2    NA    NA    NA      2     NA     NA     NA
 #3:   c     1     1     2     1     2      1      1      1      3

NOTE: Instructions to install the devel version are here 注意:安装devel版本的说明在here

This can be made compact by combining with getanID (to create the sequence variable) from splitstackshape 这可以通过用结合变得紧凑getanID (创建序列变量)从splitstackshape

  library(splitstackshape)
  dcast(getanID(mydata, c('cat', 'color')), 
              cat+color~.id, value.var=c('hat', 'shoe'), sep='')
  #   cat color  hat1  hat2  hat3  hat4  shoe1  shoe2  shoe3  shoe4
  #1:   a     1     1     1     2    NA      0      1      1     NA
  #2:   b     2     2    NA    NA    NA      2     NA     NA     NA
  #3:   c     1     1     2     1     2      1      1      1      3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM