简体   繁体   中英

R pass vector of grouping vars to purrr::map

Here is code that reads in a remote data set and prepares four summary tables showing counts for each category within the demographic variables of gender, education, ethnicity/race, and geographic region:

suppressMessages(suppressWarnings(library(tidyverse)))

urlRemote_path  <- "https://raw.githubusercontent.com/"
github_path <- "DSHerzberg/WEIGHTING-DATA/master/INPUT-FILES/"
fileName_path   <- "data-input-sim.csv"

census_match_input <- suppressMessages(read_csv(url(
  str_c(urlRemote_path, github_path, fileName_path)
)))

var_order_census_match  <- c("gender", "educ", "ethnic", "region")

census_match_cat_count_gender <- census_match_input %>%
  group_by(gender) %>%
  summarize(n_census = n()) %>%
  rename(demo_cat = gender) %>%
  mutate(demo_var = "gender") %>%
  relocate(demo_var, .before = demo_cat)

census_match_cat_count_educ <- census_match_input %>%
  group_by(educ) %>%
  summarize(n_census = n()) %>%
  rename(demo_cat = educ) %>%
  mutate(demo_var = "educ") %>%
  relocate(demo_var, .before = demo_cat)

census_match_cat_count_ethnic <- census_match_input %>%
  group_by(ethnic) %>%
  summarize(n_census = n()) %>%
  rename(demo_cat = ethnic) %>%
  mutate(demo_var = "ethnic") %>%
  relocate(demo_var, .before = demo_cat)

census_match_cat_count_region <- census_match_input %>%
  group_by(region) %>%
  summarize(n_census = n()) %>%
  rename(demo_cat = region) %>%
  mutate(demo_var = "region") %>%
  relocate(demo_var, .before = demo_cat)

I want to consolidate this code using purrr::map() . My thought was to iterate over the vector of variable names, as in:

census_match_cat_count <- var_order_census_match %>% 
  map(~
        census_match_input %>%
        group_by(!!.x) %>%
        summarize(n_census = n()))

This does not return the desired output; rather, it returns tables that lack separate rows and counts for the categories under each demographic variable.

Furthermore, when I try to expand the mapping function to include the rest of the code, as in:

census_match_cat_count <- var_order_census_match %>%
  map(
    ~
      census_match_input %>%
      group_by(!!.x) %>%
      summarize(n_census = n()) %>%
      rename(demo_cat = !!.x) %>%
      mutate(demo_var = .x) %>%
      relocate(demo_var, .before = demo_cat)
  )

I get back errors suggesting that I'm not using the correct tidyeval procedures.

There are related topics in Stack Overflow, but none seem to address my particular question of how to pass variable names to be used by dplyr::group_by() within purrr::map() .

Thanks in advance for any help.

You were almost there but you need to convert the variable name into a symbol to use with group_by() . Note that in the code below count() is a shortcut for group_by() + summarise(n = n()) .

library(dplyr)
library(purrr)

vars <- c("gender", "educ", "ethnic", "region")

vars %>%
  map(~ census_match_input %>%
         count(!!sym(.x)) %>%
         rename(demo_cat = !!.x) %>%
         mutate(demo_var = .x) %>%
         relocate(demo_var))

[[1]]
# A tibble: 2 x 3
  demo_var demo_cat     n
  <chr>    <chr>    <int>
1 gender   female     524
2 gender   male       476

[[2]]
# A tibble: 4 x 3
  demo_var demo_cat         n
  <chr>    <chr>        <int>
1 educ     BA_plus        311
2 educ     HS_grad        247
3 educ     no_HS          133
4 educ     some_college   309

[[3]]
# A tibble: 5 x 3
  demo_var demo_cat     n
  <chr>    <chr>    <int>
1 ethnic   asian       48
2 ethnic   black      146
3 ethnic   hispanic   252
4 ethnic   other       64
5 ethnic   white      490

[[4]]
# A tibble: 4 x 3
  demo_var demo_cat      n
  <chr>    <chr>     <int>
1 region   midwest     218
2 region   northeast   173
3 region   south       367
4 region   west        242

You could reshape the dataset using pivot_longer then count

library(tidyverse)
census_match_input %>% 
    pivot_longer(all_of(var_order_census_match), "demo_var", values_to = "demo_cat") %>%
    count(demo_var, demo_cat)

    # A tibble: 15 x 3
       demo_var demo_cat         n
       <chr>    <chr>        <int>
     1 educ     BA_plus        311
     2 educ     HS_grad        247
     3 educ     no_HS          133
     4 educ     some_college   309
     5 ethnic   asian           48
     6 ethnic   black          146
     7 ethnic   hispanic       252
     8 ethnic   other           64
     9 ethnic   white          490
    10 gender   female         524
    11 gender   male           476
    12 region   midwest        218
    13 region   northeast      173
    14 region   south          367
    15 region   west           242

You can also do this without non-standard evaluation keeping column names as character.

library(dplyr)

var_order_census_match  <- c("gender", "educ", "ethnic", "region")

purrr::map(var_order_census_match, 
         ~census_match_input %>%
              group_by_at(.x) %>%
              summarise(n = n()) %>%
              rename(demo_cat = .x) %>%
              mutate(demo_var = .x) %>%
              relocate(demo_var))


#[[1]]
# A tibble: 2 x 3
#  demo_var demo_cat     n
#  <chr>    <chr>    <int>
#1 gender   female     524
#2 gender   male       476

#[[2]]
# A tibble: 4 x 3
#  demo_var demo_cat         n
#  <chr>    <chr>        <int>
#1 educ     BA_plus        311
#2 educ     HS_grad        247
#3 educ     no_HS          133
#4 educ     some_college   309
#....

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM