繁体   English   中英

重塑数据框 R:一些变量从宽到长格式,一些从长到宽

[英]Reshape data frame R: Some variables wide to long format, some long to wide

嗨,stackoverflow 的人们,

我无法有效地格式化我的数据框。 我原来的框架是这样的:

    region transportation_type X2020.01.13 X2020.01.14 X2020.01.15 X2020.01.16 X2020.01.17
1  Akron             driving       100.0      103.06      107.50      106.14      123.62
2  Akron             transit       100.0      106.69      103.75      100.22       89.04
3 Akron             walking       100.0       97.23       79.05       74.77       89.55
4 Albany             driving       100.0      102.35      107.35      105.54      128.97
5 Albany             transit       100.0      100.14      105.95      107.76      101.39
6 Albany             walking       100.0      108.36      113.36      107.52      129.43

为了将它与其他一些数据合并,我想将transportation_type转换为列(宽格式),将日期X2020.01.13-X2020.01.16转换为一列(长格式),如下所示:

   region        date driving transit walking
1   Akron X2020.01.13   100.0   100.0   100.0
2   Akron X2020.01.14  103.06  106.69   97.23
3   Akron X2020.01.15  107.50  103.75   79.05
4   Akron X2020.01.16  106.14  100.22   74.77
5   Akron X2020.01.17  123.62   89.04   89.55
6  Albany X2020.01.13   100.0   100.0   100.0
7  Albany X2020.01.14  103.06  106.69   97.23
8  Albany X2020.01.15  107.50  103.75   79.05
9  Albany X2020.01.16  106.14  100.22   74.77
10 Albany X2020.01.17  123.62   89.04   89.55

我可以使用分两步重新格式化,例如使用"melt"命令,首先将transportation_type转换为宽格式,然后将日期转换为长格式。

我可以更有效地一步完成吗?

感谢您的帮助!

基本 R 或主要整形包中没有任何功能可以同时双向 pivot。

一般来说,我建议切换到使用tidyr::pivot_wider()tidyr::pivot_longer()函数。 它们仍然得到维护(reshape 和 reshape2 不再接收更新),并且它们更易于使用。

dat <- tibble::tribble(
  ~region, ~transportation_type, ~X2020.01.13, ~X2020.01.14, ~X2020.01.15, ~X2020.01.16, ~X2020.01.17,
  "Akron",           "driving",      100.0,      103.06,      107.50,      106.14,      123.62,
  "Akron",           "transit",      100.0,      106.69,      103.75,      100.22,       89.04,
  "Akron",           "walking",      100.0,       97.23,       79.05,       74.77,       89.55,
  "Albany",          "driving",      100.0,      102.35,      107.35,      105.54,      128.97,
  "Albany",          "transit",      100.0,      100.14,      105.95,      107.76,      101.39,
  "Albany",          "walking",      100.0,      108.36,      113.36,      107.52,      129.43
)
dat |>
  tidyr::pivot_longer(
    cols = -c(region, transportation_type),
    names_to = "date",
    values_to = "values"
  ) |>
  tidyr::pivot_wider(
    names_from = transportation_type,
    values_from = values
  )
#> # A tibble: 10 x 5
#>    region date        driving transit walking
#>    <chr>  <chr>         <dbl>   <dbl>   <dbl>
#>  1 Akron  X2020.01.13    100    100     100  
#>  2 Akron  X2020.01.14    103.   107.     97.2
#>  3 Akron  X2020.01.15    108.   104.     79.0
#>  4 Akron  X2020.01.16    106.   100.     74.8
#>  5 Akron  X2020.01.17    124.    89.0    89.6
#>  6 Albany X2020.01.13    100    100     100  
#>  7 Albany X2020.01.14    102.   100.    108. 
#>  8 Albany X2020.01.15    107.   106.    113. 
#>  9 Albany X2020.01.16    106.   108.    108. 
#> 10 Albany X2020.01.17    129.   101.    129.

reprex package (v2.0.0) 创建于 2021-08-22

这是另一种旋转宽 - 长 - 宽的方法:

library(dplyr)
library(tidyr)
df %>% 
    pivot_wider(
        names_from = transportation_type,
        values_from = 3:7
    ) %>% 
    pivot_longer(
        cols = starts_with("X"),
        names_to = "date"
    ) %>% 
    separate(date, c("date", "transportation"), sep="_") %>% 
    pivot_wider(
        names_from = transportation
    )
# A tibble: 10 x 5
   region date        driving transit walking
   <chr>  <chr>         <dbl>   <dbl>   <dbl>
 1 Akron  X2020.01.13    100    100     100  
 2 Akron  X2020.01.14    103.   107.     97.2
 3 Akron  X2020.01.15    108.   104.     79.0
 4 Akron  X2020.01.16    106.   100.     74.8
 5 Akron  X2020.01.17    124.    89.0    89.6
 6 Albany X2020.01.13    100    100     100  
 7 Albany X2020.01.14    102.   100.    108. 
 8 Albany X2020.01.15    107.   106.    113. 
 9 Albany X2020.01.16    106.   108.    108. 
10 Albany X2020.01.17    129.   101.    129. 

这是使用嵌套reshape s 的基本 R 选项

`row.names<-`(reshape(
  reshape(
    df,
    direction = "long",
    idvar = c("region", "transportation_type"),
    varying = -(1:2),
    times = names(df)[-c(1:2)],
    v.names = "val"
  ),
  direction = "wide",
  idvar = c("time", "region"),
  timevar = "transportation_type"
), NULL)

这使

   region        time val.driving val.transit val.walking
1   Akron X2020.01.13      100.00      100.00      100.00
2  Albany X2020.01.13      100.00      100.00      100.00
3   Akron X2020.01.14      103.06      106.69       97.23
4  Albany X2020.01.14      102.35      100.14      108.36
5   Akron X2020.01.15      107.50      103.75       79.05
6  Albany X2020.01.15      107.35      105.95      113.36
7   Akron X2020.01.16      106.14      100.22       74.77
8  Albany X2020.01.16      105.54      107.76      107.52
9   Akron X2020.01.17      123.62       89.04       89.55
10 Albany X2020.01.17      128.97      101.39      129.43

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM