[英]R dplyr gather wide to long multiple columns multiple values
I have the following wide-form data:我有以下宽格式数据:
identity = c("Race1", "Race2", "Race3")
total_2017 = c(300,325,350)
total_2018 = c(200,225,250)
total_2019 = c(100,150,200)
pct_2017 = total_2017/sum(total_2017[1],total_2018[1],total_2019[1])
pct_2018 = total_2018/sum(total_2017[2],total_2018[2],total_2019[2])
pct_2019 = total_2019/sum(total_2017[3],total_2018[3],total_2019[3])
df.wide <- cbind.data.frame(identity, total_2017, total_2018, total_2019, pct_2017, pct_2018, pct_2019)
The wide data looks like this:宽数据如下所示:
identity total_2017 total_2018 total_2019 pct_2017 pct_2018 pct_2019
1 Race1 300 200 100 0.5000000 0.2857143 0.1250
2 Race2 325 225 150 0.5416667 0.3214286 0.1875
3 Race3 350 250 200 0.5833333 0.3571429 0.2500
The 3rd, 4th and 5th columns are the totals of "identity" for years 2017 to 2019, and the last three columns are the respective shares.第 3、4、5 栏是 2017 年到 2019 年“身份”的合计,后三栏是各自的份额。 I want to convert it into long format such that the
totals
are gathered into a column Enrollment
and the percentages are gathered into a column Percent
.我想将其转换为长格式,以便将
totals
收集到Enrollment
列中,并将百分比收集到Percent
列中。 I try the following code:我尝试以下代码:
library(dplyr)
library(magrittr)
library(tidyr)
df.long <- df.wide %>%
gather(key = "Total", value = "Enrollment", starts_with("total_")) %>%
gather(key = "Share", value = "Percent", starts_with("pct_"))
Here are the first 10 rows of the long form data.这是长格式数据的前 10 行。
head(df.long, 10)
identity Total Enrollment Share Percent
1 Race1 total_2017 300 pct_2017 0.5000000
2 Race2 total_2017 325 pct_2017 0.5416667
3 Race3 total_2017 350 pct_2017 0.5833333
4 Race1 total_2018 200 pct_2017 0.5000000
5 Race2 total_2018 225 pct_2017 0.5416667
6 Race3 total_2018 250 pct_2017 0.5833333
7 Race1 total_2019 100 pct_2017 0.5000000
8 Race2 total_2019 150 pct_2017 0.5416667
9 Race3 total_2019 200 pct_2017 0.5833333
10 Race1 total_2017 300 pct_2018 0.2857143
As can be seen, the Enrollment and Percent are ordered differently.可以看出,Enrollment 和 Percent 的顺序不同。 How to have the same order in the columns?
如何在列中具有相同的顺序?
This can be done with pivot_longer
which can reshape multiple sets of columns这可以使用可以重塑多组列的
pivot_longer
来完成
library(dplyr)
library(tidyr)
df.wide %>%
pivot_longer(cols = -identity, names_to = c('.value', 'year'),
names_sep="_") %>%
arrange(year)
# A tibble: 9 x 4
# identity year total pct
# <chr> <chr> <dbl> <dbl>
#1 Race1 2017 300 0.5
#2 Race2 2017 325 0.542
#3 Race3 2017 350 0.583
#4 Race1 2018 200 0.286
#5 Race2 2018 225 0.321
#6 Race3 2018 250 0.357
#7 Race1 2019 100 0.125
#8 Race2 2019 150 0.188
#9 Race3 2019 200 0.25
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.