简体   繁体   English

R dplyr 收集从宽到长的多列多值

[英]R dplyr gather wide to long multiple columns multiple values

I have the following wide-form data:我有以下宽格式数据:

identity = c("Race1", "Race2", "Race3")
total_2017 = c(300,325,350)
total_2018 = c(200,225,250)
total_2019 = c(100,150,200)
pct_2017 = total_2017/sum(total_2017[1],total_2018[1],total_2019[1])
pct_2018 = total_2018/sum(total_2017[2],total_2018[2],total_2019[2])
pct_2019 = total_2019/sum(total_2017[3],total_2018[3],total_2019[3])
df.wide <- cbind.data.frame(identity, total_2017, total_2018, total_2019, pct_2017, pct_2018, pct_2019)

The wide data looks like this:宽数据如下所示:

     identity total_2017 total_2018 total_2019  pct_2017  pct_2018 pct_2019
1    Race1        300        200        100 0.5000000 0.2857143   0.1250
2    Race2        325        225        150 0.5416667 0.3214286   0.1875
3    Race3        350        250        200 0.5833333 0.3571429   0.2500

The 3rd, 4th and 5th columns are the totals of "identity" for years 2017 to 2019, and the last three columns are the respective shares.第 3、4、5 栏是 2017 年到 2019 年“身份”的合计,后三栏是各自的份额。 I want to convert it into long format such that the totals are gathered into a column Enrollment and the percentages are gathered into a column Percent .我想将其转换为长格式,以便将totals收集到Enrollment列中,并将百分比收集到Percent列中。 I try the following code:我尝试以下代码:

    library(dplyr)
    library(magrittr)
    library(tidyr)

df.long <- df.wide %>% 
  gather(key = "Total", value = "Enrollment", starts_with("total_")) %>%
  gather(key = "Share", value = "Percent", starts_with("pct_"))

Here are the first 10 rows of the long form data.这是长格式数据的前 10 行。

    head(df.long, 10)
   identity      Total Enrollment    Share   Percent
1     Race1 total_2017        300 pct_2017 0.5000000
2     Race2 total_2017        325 pct_2017 0.5416667
3     Race3 total_2017        350 pct_2017 0.5833333
4     Race1 total_2018        200 pct_2017 0.5000000
5     Race2 total_2018        225 pct_2017 0.5416667
6     Race3 total_2018        250 pct_2017 0.5833333
7     Race1 total_2019        100 pct_2017 0.5000000
8     Race2 total_2019        150 pct_2017 0.5416667
9     Race3 total_2019        200 pct_2017 0.5833333
10    Race1 total_2017        300 pct_2018 0.2857143

As can be seen, the Enrollment and Percent are ordered differently.可以看出,Enrollment 和 Percent 的顺序不同。 How to have the same order in the columns?如何在列中具有相同的顺序?

This can be done with pivot_longer which can reshape multiple sets of columns这可以使用可以重塑多组列的pivot_longer来完成

library(dplyr)
library(tidyr)
df.wide %>% 
   pivot_longer(cols = -identity, names_to = c('.value', 'year'), 
         names_sep="_") %>%
   arrange(year)
# A tibble: 9 x 4
#  identity year  total   pct
#  <chr>    <chr> <dbl> <dbl>
#1 Race1    2017    300 0.5  
#2 Race2    2017    325 0.542
#3 Race3    2017    350 0.583
#4 Race1    2018    200 0.286
#5 Race2    2018    225 0.321
#6 Race3    2018    250 0.357
#7 Race1    2019    100 0.125
#8 Race2    2019    150 0.188
#9 Race3    2019    200 0.25 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM