[英]Reshape data frame R: Some variables wide to long format, some long to wide
嗨,stackoverflow 的人們,
我無法有效地格式化我的數據框。 我原來的框架是這樣的:
region transportation_type X2020.01.13 X2020.01.14 X2020.01.15 X2020.01.16 X2020.01.17
1 Akron driving 100.0 103.06 107.50 106.14 123.62
2 Akron transit 100.0 106.69 103.75 100.22 89.04
3 Akron walking 100.0 97.23 79.05 74.77 89.55
4 Albany driving 100.0 102.35 107.35 105.54 128.97
5 Albany transit 100.0 100.14 105.95 107.76 101.39
6 Albany walking 100.0 108.36 113.36 107.52 129.43
為了將它與其他一些數據合並,我想將transportation_type
轉換為列(寬格式),將日期X2020.01.13-X2020.01.16
轉換為一列(長格式),如下所示:
region date driving transit walking
1 Akron X2020.01.13 100.0 100.0 100.0
2 Akron X2020.01.14 103.06 106.69 97.23
3 Akron X2020.01.15 107.50 103.75 79.05
4 Akron X2020.01.16 106.14 100.22 74.77
5 Akron X2020.01.17 123.62 89.04 89.55
6 Albany X2020.01.13 100.0 100.0 100.0
7 Albany X2020.01.14 103.06 106.69 97.23
8 Albany X2020.01.15 107.50 103.75 79.05
9 Albany X2020.01.16 106.14 100.22 74.77
10 Albany X2020.01.17 123.62 89.04 89.55
我可以使用分兩步重新格式化,例如使用"melt"
命令,首先將transportation_type
轉換為寬格式,然后將日期轉換為長格式。
我可以更有效地一步完成嗎?
感謝您的幫助!
基本 R 或主要整形包中沒有任何功能可以同時雙向 pivot。
一般來說,我建議切換到使用tidyr::pivot_wider()
和tidyr::pivot_longer()
函數。 它們仍然得到維護(reshape 和 reshape2 不再接收更新),並且它們更易於使用。
dat <- tibble::tribble(
~region, ~transportation_type, ~X2020.01.13, ~X2020.01.14, ~X2020.01.15, ~X2020.01.16, ~X2020.01.17,
"Akron", "driving", 100.0, 103.06, 107.50, 106.14, 123.62,
"Akron", "transit", 100.0, 106.69, 103.75, 100.22, 89.04,
"Akron", "walking", 100.0, 97.23, 79.05, 74.77, 89.55,
"Albany", "driving", 100.0, 102.35, 107.35, 105.54, 128.97,
"Albany", "transit", 100.0, 100.14, 105.95, 107.76, 101.39,
"Albany", "walking", 100.0, 108.36, 113.36, 107.52, 129.43
)
dat |>
tidyr::pivot_longer(
cols = -c(region, transportation_type),
names_to = "date",
values_to = "values"
) |>
tidyr::pivot_wider(
names_from = transportation_type,
values_from = values
)
#> # A tibble: 10 x 5
#> region date driving transit walking
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 Akron X2020.01.13 100 100 100
#> 2 Akron X2020.01.14 103. 107. 97.2
#> 3 Akron X2020.01.15 108. 104. 79.0
#> 4 Akron X2020.01.16 106. 100. 74.8
#> 5 Akron X2020.01.17 124. 89.0 89.6
#> 6 Albany X2020.01.13 100 100 100
#> 7 Albany X2020.01.14 102. 100. 108.
#> 8 Albany X2020.01.15 107. 106. 113.
#> 9 Albany X2020.01.16 106. 108. 108.
#> 10 Albany X2020.01.17 129. 101. 129.
由reprex package (v2.0.0) 創建於 2021-08-22
這是另一種旋轉寬 - 長 - 寬的方法:
library(dplyr)
library(tidyr)
df %>%
pivot_wider(
names_from = transportation_type,
values_from = 3:7
) %>%
pivot_longer(
cols = starts_with("X"),
names_to = "date"
) %>%
separate(date, c("date", "transportation"), sep="_") %>%
pivot_wider(
names_from = transportation
)
# A tibble: 10 x 5
region date driving transit walking
<chr> <chr> <dbl> <dbl> <dbl>
1 Akron X2020.01.13 100 100 100
2 Akron X2020.01.14 103. 107. 97.2
3 Akron X2020.01.15 108. 104. 79.0
4 Akron X2020.01.16 106. 100. 74.8
5 Akron X2020.01.17 124. 89.0 89.6
6 Albany X2020.01.13 100 100 100
7 Albany X2020.01.14 102. 100. 108.
8 Albany X2020.01.15 107. 106. 113.
9 Albany X2020.01.16 106. 108. 108.
10 Albany X2020.01.17 129. 101. 129.
這是使用嵌套reshape
s 的基本 R 選項
`row.names<-`(reshape(
reshape(
df,
direction = "long",
idvar = c("region", "transportation_type"),
varying = -(1:2),
times = names(df)[-c(1:2)],
v.names = "val"
),
direction = "wide",
idvar = c("time", "region"),
timevar = "transportation_type"
), NULL)
這使
region time val.driving val.transit val.walking
1 Akron X2020.01.13 100.00 100.00 100.00
2 Albany X2020.01.13 100.00 100.00 100.00
3 Akron X2020.01.14 103.06 106.69 97.23
4 Albany X2020.01.14 102.35 100.14 108.36
5 Akron X2020.01.15 107.50 103.75 79.05
6 Albany X2020.01.15 107.35 105.95 113.36
7 Akron X2020.01.16 106.14 100.22 74.77
8 Albany X2020.01.16 105.54 107.76 107.52
9 Akron X2020.01.17 123.62 89.04 89.55
10 Albany X2020.01.17 128.97 101.39 129.43
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.