簡體   English   中英

重塑數據框 R:一些變量從寬到長格式,一些從長到寬

[英]Reshape data frame R: Some variables wide to long format, some long to wide

嗨,stackoverflow 的人們,

我無法有效地格式化我的數據框。 我原來的框架是這樣的:

    region transportation_type X2020.01.13 X2020.01.14 X2020.01.15 X2020.01.16 X2020.01.17
1  Akron             driving       100.0      103.06      107.50      106.14      123.62
2  Akron             transit       100.0      106.69      103.75      100.22       89.04
3 Akron             walking       100.0       97.23       79.05       74.77       89.55
4 Albany             driving       100.0      102.35      107.35      105.54      128.97
5 Albany             transit       100.0      100.14      105.95      107.76      101.39
6 Albany             walking       100.0      108.36      113.36      107.52      129.43

為了將它與其他一些數據合並,我想將transportation_type轉換為列(寬格式),將日期X2020.01.13-X2020.01.16轉換為一列(長格式),如下所示:

   region        date driving transit walking
1   Akron X2020.01.13   100.0   100.0   100.0
2   Akron X2020.01.14  103.06  106.69   97.23
3   Akron X2020.01.15  107.50  103.75   79.05
4   Akron X2020.01.16  106.14  100.22   74.77
5   Akron X2020.01.17  123.62   89.04   89.55
6  Albany X2020.01.13   100.0   100.0   100.0
7  Albany X2020.01.14  103.06  106.69   97.23
8  Albany X2020.01.15  107.50  103.75   79.05
9  Albany X2020.01.16  106.14  100.22   74.77
10 Albany X2020.01.17  123.62   89.04   89.55

我可以使用分兩步重新格式化,例如使用"melt"命令,首先將transportation_type轉換為寬格式,然后將日期轉換為長格式。

我可以更有效地一步完成嗎?

感謝您的幫助!

基本 R 或主要整形包中沒有任何功能可以同時雙向 pivot。

一般來說,我建議切換到使用tidyr::pivot_wider()tidyr::pivot_longer()函數。 它們仍然得到維護(reshape 和 reshape2 不再接收更新),並且它們更易於使用。

dat <- tibble::tribble(
  ~region, ~transportation_type, ~X2020.01.13, ~X2020.01.14, ~X2020.01.15, ~X2020.01.16, ~X2020.01.17,
  "Akron",           "driving",      100.0,      103.06,      107.50,      106.14,      123.62,
  "Akron",           "transit",      100.0,      106.69,      103.75,      100.22,       89.04,
  "Akron",           "walking",      100.0,       97.23,       79.05,       74.77,       89.55,
  "Albany",          "driving",      100.0,      102.35,      107.35,      105.54,      128.97,
  "Albany",          "transit",      100.0,      100.14,      105.95,      107.76,      101.39,
  "Albany",          "walking",      100.0,      108.36,      113.36,      107.52,      129.43
)
dat |>
  tidyr::pivot_longer(
    cols = -c(region, transportation_type),
    names_to = "date",
    values_to = "values"
  ) |>
  tidyr::pivot_wider(
    names_from = transportation_type,
    values_from = values
  )
#> # A tibble: 10 x 5
#>    region date        driving transit walking
#>    <chr>  <chr>         <dbl>   <dbl>   <dbl>
#>  1 Akron  X2020.01.13    100    100     100  
#>  2 Akron  X2020.01.14    103.   107.     97.2
#>  3 Akron  X2020.01.15    108.   104.     79.0
#>  4 Akron  X2020.01.16    106.   100.     74.8
#>  5 Akron  X2020.01.17    124.    89.0    89.6
#>  6 Albany X2020.01.13    100    100     100  
#>  7 Albany X2020.01.14    102.   100.    108. 
#>  8 Albany X2020.01.15    107.   106.    113. 
#>  9 Albany X2020.01.16    106.   108.    108. 
#> 10 Albany X2020.01.17    129.   101.    129.

reprex package (v2.0.0) 創建於 2021-08-22

這是另一種旋轉寬 - 長 - 寬的方法:

library(dplyr)
library(tidyr)
df %>% 
    pivot_wider(
        names_from = transportation_type,
        values_from = 3:7
    ) %>% 
    pivot_longer(
        cols = starts_with("X"),
        names_to = "date"
    ) %>% 
    separate(date, c("date", "transportation"), sep="_") %>% 
    pivot_wider(
        names_from = transportation
    )
# A tibble: 10 x 5
   region date        driving transit walking
   <chr>  <chr>         <dbl>   <dbl>   <dbl>
 1 Akron  X2020.01.13    100    100     100  
 2 Akron  X2020.01.14    103.   107.     97.2
 3 Akron  X2020.01.15    108.   104.     79.0
 4 Akron  X2020.01.16    106.   100.     74.8
 5 Akron  X2020.01.17    124.    89.0    89.6
 6 Albany X2020.01.13    100    100     100  
 7 Albany X2020.01.14    102.   100.    108. 
 8 Albany X2020.01.15    107.   106.    113. 
 9 Albany X2020.01.16    106.   108.    108. 
10 Albany X2020.01.17    129.   101.    129. 

這是使用嵌套reshape s 的基本 R 選項

`row.names<-`(reshape(
  reshape(
    df,
    direction = "long",
    idvar = c("region", "transportation_type"),
    varying = -(1:2),
    times = names(df)[-c(1:2)],
    v.names = "val"
  ),
  direction = "wide",
  idvar = c("time", "region"),
  timevar = "transportation_type"
), NULL)

這使

   region        time val.driving val.transit val.walking
1   Akron X2020.01.13      100.00      100.00      100.00
2  Albany X2020.01.13      100.00      100.00      100.00
3   Akron X2020.01.14      103.06      106.69       97.23
4  Albany X2020.01.14      102.35      100.14      108.36
5   Akron X2020.01.15      107.50      103.75       79.05
6  Albany X2020.01.15      107.35      105.95      113.36
7   Akron X2020.01.16      106.14      100.22       74.77
8  Albany X2020.01.16      105.54      107.76      107.52
9   Akron X2020.01.17      123.62       89.04       89.55
10 Albany X2020.01.17      128.97      101.39      129.43

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM