简体   繁体   English

如何使用R将行转置为镜像SAS实现的列?

[英]How can I use R to transpose rows into columns mirroring SAS implementation?

I've searched quite some time for this and can't figure out a way that doesn't seem to send me down what appears to be the wrong path. 我已经花了很多时间寻找这个问题,却找不到一种似乎并没有使我误解的方法。 I'm trying to replicate the following SAS implementation within R. 我正在尝试在R中复制以下SAS实现

Right now I'm trying to figure out how to transpose, without aggregation, several values based on several identifying fields. 现在,我正在尝试找出如何在不聚合的情况下基于多个标识字段对多个值进行转置。

Example starting point: 起点示例:

Cat1  Cat2      Cat3    Date        Occ  Dur
A00   Group1    Sub1    2015-05-09  1    30
A00   Group1    Sub1    2015-09-09  2    30
A00   Group1    Sub2    2015-06-23  1    60
B00   Group1    Sub1    2015-07-30  3    30
B00   Group1    Sub2    2015-03-25  1    60
B00   Group1    Sub2    2015-02-14  2    60

And i'm looking to get the following output: 我正在寻找以下输出:

Cat1    Cat2    Cat3    Date1       Date2       Occ1    Occ2    Dur1  Dur2
A00     Group1  Sub1    2015-05-09  2015-09-09  1       2         30    30
A00     Group1  Sub2    2015-06-23              1                 60      
B00     Group1  Sub1    2015-07-30              3                 30      
B00     Group1  Sub2    2015-03-25  2015-02-14  1       2         60    60

I realize that different environments may require different approaches. 我意识到不同的环境可能需要不同的方法。 I'm certainly open for alternative solutions than directly trying to replicate the logic within SAS. 除了直接尝试在SAS中复制逻辑之外,我当然对替代解决方案持开放态度。 I've tried various attempts at reshaping the data with melt and cast without any luck. 我已经尝试过各种尝试,以便在没有任何运气的情况下通过熔炼和铸造来重塑数据。 Any assistance would be hugely appreciated! 任何帮助将不胜感激!

Here is a data.table based solution that mimics the logic pretty closely: 这是一个基于data.table的解决方案,它非常接近地模仿了逻辑:

library(data.table)
library(reshape2)

DT <- fread("Cat1    Cat2    Cat3    Date    Occ Dur
            A00 Group1  Sub1    2015-05-09  1   30
            A00 Group1  Sub1    2015-09-09  2   30
            A00 Group1  Sub2    2015-06-23  1   60
            B00 Group1  Sub1    2015-07-30  3   30
            B00 Group1  Sub2    2015-03-25  1   60
            B00 Group1  Sub2    2015-02-14  2   60")


DTw <- dcast(
  melt(DT, id.vars = c("Cat1", "Cat2", "Cat3"))[
    , Idx := 1:.N
    , keyby = .(Cat1,Cat2, Cat3, variable)
    ]
  , Cat1 + Cat2 + Cat3 ~ variable + Idx)

DTw

The result look like this: 结果看起来像这样:

  Cat1   Cat2 Cat3     Date_1     Date_2 Occ_1 Occ_2 Dur_1 Dur_2
1  A00 Group1 Sub1 2015-05-09 2015-09-09     1     2    30    30
2  A00 Group1 Sub2 2015-06-23       <NA>     1  <NA>    60  <NA>
3  B00 Group1 Sub1 2015-07-30       <NA>     3  <NA>    30  <NA>
4  B00 Group1 Sub2 2015-03-25 2015-02-14     1     2    60    60

Here is a dpyr and tidyr solution. 这是dpyrtidyr解决方案。 There may be a way to do this more cleanly, but it works. 可能有一种方法可以更干净地执行此操作,但是它可以工作。 It does produce a warning about id() is deprecated and I am not sure how to get rid of it 它确实会发出有关id() is deprecated的警告,并且我不确定如何摆脱它

library(dplyr)
library(tidyr)

df %>%
   gather(key, value, -c(Cat1:Cat3)) %>%  ## Put in long format
   group_by(Cat1, Cat2, Cat3, key)   %>%  ## Group for numbering (1,2)
   mutate(rn = row_number())         %>%  ## Add row numbers to unite with key column
   unite(new_key, key, rn, sep = '') %>%  ## Make new unique key to be col name
   spread(new_key, value, fill = '') %>%  ## Put in 'wide' format
   select(Cat1, Cat2, Cat3, Date1, Date2, Occ1, Occ2, Dur1, Dur2)  # re-order columns

Results 结果

  Cat1   Cat2 Cat3      Date1      Date2 Occ1 Occ2 Dur1 Dur2
1  A00 Group1 Sub1 2015-05-09 2015-09-09    1    2   30   30
2  A00 Group1 Sub2 2015-06-23               1        60     
3  B00 Group1 Sub1 2015-07-30               3        30     
4  B00 Group1 Sub2 2015-03-25 2015-02-14    1    2   60   60

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM