简体   繁体   中英

Reshape data frame R: Some variables wide to long format, some long to wide

Hi people of stackoverflow,

I have trouble formatting my data frame efficiently. My original frame looks like this:

    region transportation_type X2020.01.13 X2020.01.14 X2020.01.15 X2020.01.16 X2020.01.17
1  Akron             driving       100.0      103.06      107.50      106.14      123.62
2  Akron             transit       100.0      106.69      103.75      100.22       89.04
3 Akron             walking       100.0       97.23       79.05       74.77       89.55
4 Albany             driving       100.0      102.35      107.35      105.54      128.97
5 Albany             transit       100.0      100.14      105.95      107.76      101.39
6 Albany             walking       100.0      108.36      113.36      107.52      129.43

To merge it with some other data, I want to convert the transportation_type into columns (wide format) and the dates X2020.01.13-X2020.01.16 into one column (long format), like so:

   region        date driving transit walking
1   Akron X2020.01.13   100.0   100.0   100.0
2   Akron X2020.01.14  103.06  106.69   97.23
3   Akron X2020.01.15  107.50  103.75   79.05
4   Akron X2020.01.16  106.14  100.22   74.77
5   Akron X2020.01.17  123.62   89.04   89.55
6  Albany X2020.01.13   100.0   100.0   100.0
7  Albany X2020.01.14  103.06  106.69   97.23
8  Albany X2020.01.15  107.50  103.75   79.05
9  Albany X2020.01.16  106.14  100.22   74.77
10 Albany X2020.01.17  123.62   89.04   89.55

I can reformat using the in two steps, using eg the "melt" command, by first converting the transportation_type into wide format and then the dates into long.

Can I do it more efficiently in one step?

Thanks for your help!

There aren't any functions in base R or the major reshaping packages that can simultaneously pivot in both directions.

In general, I would recommend switching to using the tidyr::pivot_wider() and tidyr::pivot_longer() functions. They are still maintained (reshape and reshape2 no longer receive updates), and they are easier to work with.

dat <- tibble::tribble(
  ~region, ~transportation_type, ~X2020.01.13, ~X2020.01.14, ~X2020.01.15, ~X2020.01.16, ~X2020.01.17,
  "Akron",           "driving",      100.0,      103.06,      107.50,      106.14,      123.62,
  "Akron",           "transit",      100.0,      106.69,      103.75,      100.22,       89.04,
  "Akron",           "walking",      100.0,       97.23,       79.05,       74.77,       89.55,
  "Albany",          "driving",      100.0,      102.35,      107.35,      105.54,      128.97,
  "Albany",          "transit",      100.0,      100.14,      105.95,      107.76,      101.39,
  "Albany",          "walking",      100.0,      108.36,      113.36,      107.52,      129.43
)
dat |>
  tidyr::pivot_longer(
    cols = -c(region, transportation_type),
    names_to = "date",
    values_to = "values"
  ) |>
  tidyr::pivot_wider(
    names_from = transportation_type,
    values_from = values
  )
#> # A tibble: 10 x 5
#>    region date        driving transit walking
#>    <chr>  <chr>         <dbl>   <dbl>   <dbl>
#>  1 Akron  X2020.01.13    100    100     100  
#>  2 Akron  X2020.01.14    103.   107.     97.2
#>  3 Akron  X2020.01.15    108.   104.     79.0
#>  4 Akron  X2020.01.16    106.   100.     74.8
#>  5 Akron  X2020.01.17    124.    89.0    89.6
#>  6 Albany X2020.01.13    100    100     100  
#>  7 Albany X2020.01.14    102.   100.    108. 
#>  8 Albany X2020.01.15    107.   106.    113. 
#>  9 Albany X2020.01.16    106.   108.    108. 
#> 10 Albany X2020.01.17    129.   101.    129.

Created on 2021-08-22 by the reprex package (v2.0.0)

Here is another approach with pivoting wide - long - wide:

library(dplyr)
library(tidyr)
df %>% 
    pivot_wider(
        names_from = transportation_type,
        values_from = 3:7
    ) %>% 
    pivot_longer(
        cols = starts_with("X"),
        names_to = "date"
    ) %>% 
    separate(date, c("date", "transportation"), sep="_") %>% 
    pivot_wider(
        names_from = transportation
    )
# A tibble: 10 x 5
   region date        driving transit walking
   <chr>  <chr>         <dbl>   <dbl>   <dbl>
 1 Akron  X2020.01.13    100    100     100  
 2 Akron  X2020.01.14    103.   107.     97.2
 3 Akron  X2020.01.15    108.   104.     79.0
 4 Akron  X2020.01.16    106.   100.     74.8
 5 Akron  X2020.01.17    124.    89.0    89.6
 6 Albany X2020.01.13    100    100     100  
 7 Albany X2020.01.14    102.   100.    108. 
 8 Albany X2020.01.15    107.   106.    113. 
 9 Albany X2020.01.16    106.   108.    108. 
10 Albany X2020.01.17    129.   101.    129. 

Here is a base R option using nested reshape s

`row.names<-`(reshape(
  reshape(
    df,
    direction = "long",
    idvar = c("region", "transportation_type"),
    varying = -(1:2),
    times = names(df)[-c(1:2)],
    v.names = "val"
  ),
  direction = "wide",
  idvar = c("time", "region"),
  timevar = "transportation_type"
), NULL)

which gives

   region        time val.driving val.transit val.walking
1   Akron X2020.01.13      100.00      100.00      100.00
2  Albany X2020.01.13      100.00      100.00      100.00
3   Akron X2020.01.14      103.06      106.69       97.23
4  Albany X2020.01.14      102.35      100.14      108.36
5   Akron X2020.01.15      107.50      103.75       79.05
6  Albany X2020.01.15      107.35      105.95      113.36
7   Akron X2020.01.16      106.14      100.22       74.77
8  Albany X2020.01.16      105.54      107.76      107.52
9   Akron X2020.01.17      123.62       89.04       89.55
10 Albany X2020.01.17      128.97      101.39      129.43

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM