简体   繁体   中英

Reshaping multiple columns in R

I have this following data:

df <- data.frame(
  wave = c(350, 352),
  cdom = c(0.164910664183534, 0.161336423973549),
  total = c(0.173292853508359, 0.164541380243188),
  wave_1 = c(350, 352),
  cdom_1 = c(0.157738707282744, 0.149555740098184),
  total_1 = c(0.16501632769282, 0.151631889636391),
  wave_2 = c(350, 352),
  cdom_2 = c(0.143293704793142, 0.133057094683334),
  total_2 = c(0.148878497119496, 0.136150629840465),
  wave_3 = c(350, 352),
  cdom_3 = c(0.0972284241775975, 0.0906890150335725),
  total_3 = c(0.108645612944463, 0.103640164204995),
  wave_4 = c(350, 352),
  cdom_4 = c(0.0801780489449968, 0.0779336395415438),
  total_4 = c(0.103930690374372, 0.095768602460239))

df
#>   wave      cdom     total wave_1    cdom_1   total_1 wave_2    cdom_2
#> 1  350 0.1649107 0.1732929    350 0.1577387 0.1650163    350 0.1432937
#> 2  352 0.1613364 0.1645414    352 0.1495557 0.1516319    352 0.1330571
#>     total_2 wave_3     cdom_3   total_3 wave_4     cdom_4   total_4
#> 1 0.1488785    350 0.09722842 0.1086456    350 0.08017805 0.1039307
#> 2 0.1361506    352 0.09068902 0.1036402    352 0.07793364 0.0957686

Each triple column represents 1 sample. Hence, in this example, there are 5 samples. I would like to reshape this dataframe to longer with 4 columns ( sample , wave , cdom and total ). The solution can be base R, dplyr or data.table as I do not have any preference.

using data.table:

library("data.table")

df <- data.table(df)
df[1, id := "a"]
df[2, id := "b"]
melt(df, id.vars = "id"  ,
     measure.vars = patterns(wave = "wave",
                             cdom = "cdom",
                             total = "total"))

    id variable wave       cdom     total
 1:  a        1  350 0.16491066 0.1732929
 2:  b        1  352 0.16133642 0.1645414
 3:  a        2  350 0.15773871 0.1650163
 4:  b        2  352 0.14955574 0.1516319
 5:  a        3  350 0.14329370 0.1488785
 6:  b        3  352 0.13305709 0.1361506
 7:  a        4  350 0.09722842 0.1086456
 8:  b        4  352 0.09068902 0.1036402
 9:  a        5  350 0.08017805 0.1039307
10:  b        5  352 0.07793364 0.0957686

This takes a little bit of work, since the columns aren't formatted uniformly, and you have multiple observations per "sample". We can use functions from the tidyverse package:

library(tidyverse)

df %>%
    # create column for observation order
    mutate(rn = row_number()) %>%
    # wide to long
    gather(variable, value, -rn) %>%
    # separate variable columns (e.g. wave_1 -> wave, 1)
    separate(variable, into = c("var", "sample")) %>%
    # fix NA for "first" sample
    mutate(sample = ifelse(is.na(sample), 0, sample))  %>%
    # reshape from long to a little less long
    group_by(rn, sample) %>%
    spread(var, value)

 #      rn sample   cdom  total  wave
 #   <int> <chr>   <dbl>  <dbl> <dbl>
 # 1     1 0      0.165  0.173    350
 # 2     1 1      0.158  0.165    350
 # 3     1 2      0.143  0.149    350
 # 4     1 3      0.0972 0.109    350
 # 5     1 4      0.0802 0.104    350
 # 6     2 0      0.161  0.165    352
 # 7     2 1      0.150  0.152    352
 # 8     2 2      0.133  0.136    352
 # 9     2 3      0.0907 0.104    352
# 10     2 4      0.0779 0.0958   352

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM