简体   繁体   中英

R: rehape from "wide" to "long", keeping some variables "wide"

I have data file in wide format, with a set of recurring variables (var1 var2, below)

data have:

| ID | background vars| var1.A | var2.A | var1.B | var2.B | var1.C | var2.C |
| -: | :------------- |:------:|:------:|:------:|:------:|:------:|:------:|
| 1  |  data1         | 1      | 2      | 3      | 4      | 5      | 6      | 
| 2  |  data2         | 7      | 8      | 9      | 10     | 11     | 12     |

I need to reshape it "half way" into to long format, ie keep a each var group together (wide), and each recurrence in a different line (long).

data want:

| ID | background vars | recurrence | var1   | var2   |
| -: | :-------------- |:----------:|:------:|:------:|
| 1  |  data1          | A          | 1      | 2      |
| 1  |  data1          | B          | 3      | 4      |
| 1  |  data1          | C          | 5      | 6      |
| 2  |  data2          | A          | 7      | 8      |
| 2  |  data2          | B          | 9      | 10     |
| 2  |  data2          | C          | 11     | 12     |

I found some solutions for this using reshape() gather() and melt(). However, all these collapse ALL variables to long format, and do not allow for some variables to be kept "wide").

How can data be shaped this way using R?

Use the keyword '.value' in the names_to argument to keep that part of the column name in wide format:

tidyr::pivot_longer(df, c(-ID, -`background vars`),
                    names_sep = '\\.', 
                    names_to = c('.value', 'recurrence'))
#> # A tibble: 6 x 5
#>      ID `background vars` recurrence  var1  var2
#>   <int> <chr>             <chr>      <int> <int>
#> 1     1 data1             A              1     2
#> 2     1 data1             B              3     4
#> 3     1 data1             C              5     6
#> 4     2 data2             A              7     8
#> 5     2 data2             B              9    10
#> 6     2 data2             C             11     1

If you need your code to be easily readable/comprehensible and you feel that ".value" in @Allan's example is a little opaque, you might consider a two-step pivot - simply pivot_long() and then immediately pivot_wide() with different parameters:

df <- structure(
  list(
    ID = 1:2,
    background.vars = c("data1", "data2"),
    var1.A = c(1L, 7L),
    var2.A = c(2L, 8L),
    var1.B = c(3L, 9L),
    var2.B = c(4L, 10L),
    var1.C = c(5L, 11L),
    var2.C = c(6L, 12L)),
  class = "data.frame",
  row.names = c(NA, -2L)
)

require(tidyr)
#> Loading required package: tidyr

long.df <- 
  pivot_longer(df, 
             c(-ID, -`background.vars`), #lengthen all columns but these
             names_sep = "\\.",          #split column names wherever there is a '.'
             names_to = c("var", "letter"))

long.df
#> # A tibble: 12 × 5
#>       ID background.vars var   letter value
#>    <int> <chr>           <chr> <chr>  <int>
#>  1     1 data1           var1  A          1
#>  2     1 data1           var2  A          2
#>  3     1 data1           var1  B          3
#>  4     1 data1           var2  B          4
#>  5     1 data1           var1  C          5
#>  6     1 data1           var2  C          6
#>  7     2 data2           var1  A          7
#>  8     2 data2           var2  A          8
#>  9     2 data2           var1  B          9
#> 10     2 data2           var2  B         10
#> 11     2 data2           var1  C         11
#> 12     2 data2           var2  C         12

pivot_wider(long.df, names_from = "var")
#> # A tibble: 6 × 5
#>      ID background.vars letter  var1  var2
#>   <int> <chr>           <chr>  <int> <int>
#> 1     1 data1           A          1     2
#> 2     1 data1           B          3     4
#> 3     1 data1           C          5     6
#> 4     2 data2           A          7     8
#> 5     2 data2           B          9    10
#> 6     2 data2           C         11    12

Created on 2022-05-24 by the reprex package (v2.0.1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM