Split a Dataset into a Nested List of Dataframes and then Spread Using Tidyr and Purrr

Question

library(ggmosaic)
library(tidyverse)

Below is the sample code

happy2<-happy%>%
select(sex,marital,degree,health)%>%
group_by(sex,marital,degree,health)%>%
summarise(Count=n())

The following code splits the dataset into a nested list with tables of male and female (sex variable) for each category of the degree variable.

happy2 %>% 
split(.$degree) %>% 
lapply(function(x) split(x, x$sex))

This is where I'm now struggling. I would like to reshape, or using Tidyr, spread the "marital" variable, or perhaps this should be split again, so that each category of "marital" is a column header with each column containing the "health" variable and corresponding "Count". The redundant "sex" and "degree" columns can be dropped.

Since I'm working with a list, I've been attempting to use Tidyverse methods, for example, I've been trying to use purrr to drop variables:

happy2%>%map(~select(.x,-sex)

I'm thinking that I can also spread using purrr, but I'm having trouble making this work.

To help illustrate what I'm looking for, I attached a pic of the possible structure. I didn't include all categories and the counts are not correct since I'm only showing the structure. I suppose the "marital" category could also be a third split variable as well if that's easier? So what I'm hoping for is male and female tables for each category of degree, with marital by health and showing the corresponding count.

Help would be appreciated...

Answer 1

Would the following work? I changed the syntax for split by sex so that I can chain the subsequent commands together:

happy2 %>% 
  split(.$degree) %>% 
  lapply(function(x) x %>% split(.$sex) %>%
           lapply(function(x) x %>% select(-sex, -degree) %>%
                    spread(health, Count)))

Edit:

This would give you a separate table for each marital status:

happy2 %>% 
  ungroup() %>%
  split(.$degree) %>% 
  lapply(function(x) x %>% split(.$sex) %>%
           lapply(function(x) x %>% select(-sex, -degree) %>% split(.$marital)))

And if you don't want the first column indicating marital status, the following version drops that:

happy2 %>% 
  ungroup() %>%
  split(.$degree) %>% 
  lapply(function(x) x %>% split(.$sex) %>%
           lapply(function(x) x %>% select(-sex, -degree) %>% split(.$marital) %>%
                    lapply(function(x) x %>% select(-marital))))

Answer 2

What about this:

# cleaned up your code a bit
# removed the select (as it does nothing)
# consistent column names (count is lower case like the rest of the variables)
# added spacing
happy2 <- happy %>%
  group_by(sex, marital, degree, health) %>%
  summarise(count=n())

happy2 %>%
  dplyr::ungroup() %>% 
  split(list(.$degree, .$sex, .$marital)) %>% 
  lapply(. %>% select(health, count))

Or do you really want the "martial" status as table heading for the "health" column has in your picture?

Split a Dataset into a Nested List of Dataframes and then Spread Using Tidyr and Purrr

Question

2 answers

solution1
0 ACCPTED 2017-08-14 01:40:56

solution2
0 2017-08-14 05:48:56

Split a Dataset into a Nested List of Dataframes and then Spread Using Tidyr and Purrr

Question

2 answers

solution1 0 ACCPTED 2017-08-14 01:40:56

solution2 0 2017-08-14 05:48:56

solution1
0 ACCPTED 2017-08-14 01:40:56

solution2
0 2017-08-14 05:48:56