How to do a nested 2 level reshape in R?

Question

I am currently working on a survey project that asks about respondents' economic activities. So we will ask the jobs/businesses they have, and since some jobs/businesses are seasonal, we ask about their revenue, cost, and profit in each season.

I am a bit stuck on how to reshape this in R from wide to long. This is what I have:

respondent_name	job_name_1	seasonality_type_1	job_name_2	seasonality_type_2	cst_1_1	rev_1_1	cst_1_2	rev_1_2	cst_2_1	rev_2_1	cst_2_2	rev_2_2
James	farmer	high / low	teacher	active / inactive	5000	6000	2000	3000	100	200	0	0
Alice	livestock rearing	high / low	barber	active / inactive	100000	200000	20000	30000	5000	7000	2000	0

I want the output to look like:

respondent_name	job_number	job_name	season_number	season	cost	revenue
James	1	farmer	1	high	5000	6000
James	1	farmer	2	low	2000	3000
James	2	teacher	1	active	100	200
James	2	teacher	2	inactive	0	0
Alice	1	livestock rearing	1	high	100000	200000
Alice	1	livestock rearing	2	low	2000	3000
Alice	2	barber	1	active	5000	7000
Alice	2	barber	2	inactive	2000	0

Anyone has idea how to do this? I used the melt function but can't figure out how to not lose the 2 level information. What melt gives me is just job_season_1, job_season_2, job_season_3, job_season_4, which lose the pair wise info between job and season.

Thanks so much!

Here is the toy dataset in the table above:

survey = data.frame(respondent_name = c("James", "Alice"),
                    job_name_1 = c("farmer", "livestock rearing"),
                    seasonality_type_1 = c("high / low", "high / low"), 
                    job_name_2 = c("teacher", "barber"),
                    seasonality_type_2 = c("active / inactive", "active / inactive"),
                    cst_1_1 = c(5000, 100000),
                    rev_1_1 = c(6000, 200000),
                    cst_1_2 = c(2000, 20000),
                    rev_1_2 = c(3000, 30000),
                    cst_2_1 = c(100, 5000),
                    rev_2_1 = c(200, 7000),
                    cst_2_2 = c(0, 2000),
                    rev_2_2 = c(0, 0)
                    )

Answer 1

first split the seasonality type1 and type2 into 2 columns and then melt , for melt i use the column numbers because your column names are too long

library(data.table)
survey[, c("s1", "s2") := tstrsplit(seasonality_type_1, "/")]
survey[, c("s1_2", "s2_2") := tstrsplit(seasonality_type_2, "/")]
melt(survey, id.vars=1, measure.vars=list(c(2,2,4,4),c(14,15,16,17), c(6,8,10,12), c(7,9,11,13)))

Answer 2

This uses tidyverse and not data.table, but gets you most of the way there...

library(tidyverse)

survey2 <- survey %>%
  separate(seasonality_type_1, 
       sep = " / ", 
       into = c("season_1_1", "season_1_2")) %>%
  separate(seasonality_type_2,
       sep = " / ",
       into = c("season_2_1", "season_2_2")) %>%
  pivot_longer(cols = starts_with("job_name"),
           names_to = "job_number",
           names_prefix = "job_name_",
           values_to = "job"
          )

  survey2_1 <- survey2 %>%
    select(respondent_name, job_number, job, ends_with("_1")) %>%
    mutate(
      season = 1,
      cost = ifelse(job_number == 1, cst_1_1, cst_2_1)
    )

  survey2_2 <- survey2 %>%
    select(respondent_name, job_number, job, ends_with("_2")) %>%
    mutate(
      season = 2,
      cost = ifelse(job_number == 1, cst_1_2, cst_2_2)
    )

What's still missing:

Similar calculations for revenue as cost
Removing unneeded columns, ending with similar column names in each of survey2_1 and survey2_2
Joining (rbind) survey2_1 and survey2_2

Answer 3

The short answer

library(tidyverse)
seasonality_desc <- tribble(
  ~ job_number, ~ season_number, ~ season,
  1,            1,               "high",
  1,            2,               "low",
  2,            1,               "active",
  2,            2,               "inactive"
)
survey %>% 
  select(!contains("seasonality_type")) %>% 
  rename_with(
    ~ str_replace(.x, pattern = "^(cst|rev)_([0-9])_([0-9])$", 
                  replacement = "\\1_\\3_\\2"), 
    .cols = matches("^(cst|rev)_[0-9]_[0-9]$")
  ) %>% 
  pivot_longer(
    !respondent_name, names_to = c(".value", "job_number"), 
    names_pattern = "(.*)_([0-9]+)$", 
    names_transform = list(job_number = as.numeric)
  ) %>% 
  pivot_longer(
    matches("cst|rev"), names_to = c(".value", "season_number"),
    names_sep = "_", names_transform = list(season_number = as.numeric)
  ) %>% 
  left_join(seasonality_desc, by = c("job_number", "season_number")) %>% 
  select(
    respondent_name, job_number, job_name, season_number, season, cost = cst, 
    revenue = rev
  )
#> # A tibble: 8 × 7
#>   respondent_name job_number job_name       season_number season    cost revenue
#>   <chr>                <dbl> <chr>                  <dbl> <chr>    <dbl>   <dbl>
#> 1 James                    1 farmer                     1 high       5e3    6000
#> 2 James                    1 farmer                     2 low        2e3    3000
#> 3 James                    2 teacher                    1 active     1e2     200
#> 4 James                    2 teacher                    2 inactive   0         0
#> 5 Alice                    1 livestock rea…             1 high       1e5  200000
#> 6 Alice                    1 livestock rea…             2 low        2e4   30000
#> 7 Alice                    2 barber                     1 active     5e3    7000
#> 8 Alice                    2 barber                     2 inactive   2e3       0

^{Created on 2022-02-04 by the reprex package (v2.0.1)}

The long answer

We'll use two instances of tidyr 's pivot_longer() to lengthen the dataset, but there's some other wrangling that's needed in between.

Remove metadata

First, seasonality_type_1 and seasonality_type_2 do not provide any individual-level information, but rather only metadata specifying the labels that correspond to the different season numbers in the context of the job_number . So, manually make a tibble containing these labels, and delete the metadata from the original dataset:

library(tidyverse)

seasonality_desc <- tribble(
  ~ job_number, ~ season_number, ~ season,
  1,            1,               "high",
  1,            2,               "low",
  2,            1,               "active",
  2,            2,               "inactive"
)

survey2 <-  select(survey, !contains("seasonality_type"))
knitr::kable(survey2)

respondent_name	job_name_1	job_name_2	cst_1_1	rev_1_1	cst_1_2	rev_1_2	cst_2_1	rev_2_1	cst_2_2	rev_2_2
James	farmer	teacher	5e+03	6e+03	2000	3000	100	200	0	0
Alice	livestock rearing	barber	1e+05	2e+05	20000	30000	5000	7000	2000	0

First pivot: job number

The first pivot is by job name and number. Unfortunately, the job number appears in the middle of the cst / rev column names, and this doesn't play well with pivot_longer() . You can switch the job number with the season number manually, but if you're looking for an automated way of doing it, here is one way using regex:

survey3 <- rename_with(
  survey2,
  ~ str_replace(.x, pattern = "^(cst|rev)_([0-9])_([0-9])$", 
                replacement = "\\1_\\3_\\2"), 
  .cols = matches("^(cst|rev)_[0-9]_[0-9]$")
)
knitr::kable(survey3)

respondent_name	job_name_1	job_name_2	cst_1_1	rev_1_1	cst_2_1	rev_2_1	cst_1_2	rev_1_2	cst_2_2	rev_2_2
James	farmer	teacher	5e+03	6e+03	2000	3000	100	200	0	0
Alice	livestock rearing	barber	1e+05	2e+05	20000	30000	5000	7000	2000	0

Now notice a pattern: all column names except respondent_name is of the form <VARIABLE>_<JOB NUMBER> -- exactly the specification we'll provide pivot_longer() (apologies, stack overflow is having a hard time rendering the markdown output for the table):

survey4 <- survey3 %>% 
  pivot_longer(
    !respondent_name, names_to = c(".value", "job_number"), 
    names_pattern = "(.*)_([0-9]+)$", 
    names_transform = list(job_number = as.numeric)
  )
knitr::kable(survey4)

|respondent_name | job_number|job_name          | cst_1| rev_1| cst_2| rev_2|
|:---------------|----------:|:-----------------|-----:|-----:|-----:|-----:|
|James           |          1|farmer            | 5e+03| 6e+03|  2000|  3000|
|James           |          2|teacher           | 1e+02| 2e+02|     0|     0|
|Alice           |          1|livestock rearing | 1e+05| 2e+05| 20000| 30000|
|Alice           |          2|barber            | 5e+03| 7e+03|  2000|     0|

Second pivot: season number

Now for the second pivot, notice that the cst and rev columns are of the form <VARIABLE>_<SEASON NUMBER> -- exactly the specification we'll provide pivot_longer() . This time, it's enough to just specify that an underscore is separating the names (we couldn't last time because there were more than one underscores present in the column names) (apologies again about the table):

survey5 <- survey4 %>%
  pivot_longer(
    matches("cst|rev"), names_to = c(".value", "season_number"),
    names_sep = "_", names_transform = list(season_number = as.numeric)
  )
knitr::kable(survey5)

|respondent_name | job_number|job_name          | season_number|   cst|   rev|
|:---------------|----------:|:-----------------|-------------:|-----:|-----:|
|James           |          1|farmer            |             1| 5e+03| 6e+03|
|James           |          1|farmer            |             2| 2e+03| 3e+03|
|James           |          2|teacher           |             1| 1e+02| 2e+02|
|James           |          2|teacher           |             2| 0e+00| 0e+00|
|Alice           |          1|livestock rearing |             1| 1e+05| 2e+05|
|Alice           |          1|livestock rearing |             2| 2e+04| 3e+04|
|Alice           |          2|barber            |             1| 5e+03| 7e+03|
|Alice           |          2|barber            |             2| 2e+03| 0e+00|

Add in labels and clean

Now we can add in the labels stored in the metadata seasonality_desc we made earlier, reorder the columns as desired, and rename "cst" to "cost" and "rev" to "revenue":

survey6 <- survey5 %>% 
  left_join(seasonality_desc, by = c("job_number", "season_number")) %>% 
  select(
    respondent_name, job_number, job_name, season_number, season, cost = cst, 
    revenue = rev
  )
knitr::kable(survey6)

respondent_name	job_number	job_name	season_number	season	cost	revenue
James	1	farmer	1	high	5e+03	6e+03
James	1	farmer	2	low	2e+03	3e+03
James	2	teacher	1	active	1e+02	2e+02
James	2	teacher	2	inactive	0e+00	0e+00
Alice	1	livestock rearing	1	high	1e+05	2e+05
Alice	1	livestock rearing	2	low	2e+04	3e+04
Alice	2	barber	1	active	5e+03	7e+03
Alice	2	barber	2	inactive	2e+03	0e+00

How to do a nested 2 level reshape in R?

Question

3 answers

solution1
1 2022-02-04 05:36:55

solution2
0 2022-02-04 05:32:58

solution3
0 2022-02-05 07:56:49

The short answer

The long answer

Remove metadata

First pivot: job number

Second pivot: season number

Add in labels and clean

How to do a nested 2 level reshape in R?

Question

3 answers

solution1 1 2022-02-04 05:36:55

solution2 0 2022-02-04 05:32:58

solution3 0 2022-02-05 07:56:49

The short answer

The long answer

Remove metadata

First pivot: job number

Second pivot: season number

Add in labels and clean

solution1
1 2022-02-04 05:36:55

solution2
0 2022-02-04 05:32:58

solution3
0 2022-02-05 07:56:49