Here is code that reads in a remote data set and prepares four summary tables showing counts for each category within the demographic variables of gender, education, ethnicity/race, and geographic region:
suppressMessages(suppressWarnings(library(tidyverse)))
urlRemote_path <- "https://raw.githubusercontent.com/"
github_path <- "DSHerzberg/WEIGHTING-DATA/master/INPUT-FILES/"
fileName_path <- "data-input-sim.csv"
census_match_input <- suppressMessages(read_csv(url(
str_c(urlRemote_path, github_path, fileName_path)
)))
var_order_census_match <- c("gender", "educ", "ethnic", "region")
census_match_cat_count_gender <- census_match_input %>%
group_by(gender) %>%
summarize(n_census = n()) %>%
rename(demo_cat = gender) %>%
mutate(demo_var = "gender") %>%
relocate(demo_var, .before = demo_cat)
census_match_cat_count_educ <- census_match_input %>%
group_by(educ) %>%
summarize(n_census = n()) %>%
rename(demo_cat = educ) %>%
mutate(demo_var = "educ") %>%
relocate(demo_var, .before = demo_cat)
census_match_cat_count_ethnic <- census_match_input %>%
group_by(ethnic) %>%
summarize(n_census = n()) %>%
rename(demo_cat = ethnic) %>%
mutate(demo_var = "ethnic") %>%
relocate(demo_var, .before = demo_cat)
census_match_cat_count_region <- census_match_input %>%
group_by(region) %>%
summarize(n_census = n()) %>%
rename(demo_cat = region) %>%
mutate(demo_var = "region") %>%
relocate(demo_var, .before = demo_cat)
I want to consolidate this code using purrr::map()
. My thought was to iterate over the vector of variable names, as in:
census_match_cat_count <- var_order_census_match %>%
map(~
census_match_input %>%
group_by(!!.x) %>%
summarize(n_census = n()))
This does not return the desired output; rather, it returns tables that lack separate rows and counts for the categories under each demographic variable.
Furthermore, when I try to expand the mapping function to include the rest of the code, as in:
census_match_cat_count <- var_order_census_match %>%
map(
~
census_match_input %>%
group_by(!!.x) %>%
summarize(n_census = n()) %>%
rename(demo_cat = !!.x) %>%
mutate(demo_var = .x) %>%
relocate(demo_var, .before = demo_cat)
)
I get back errors suggesting that I'm not using the correct tidyeval
procedures.
There are related topics in Stack Overflow, but none seem to address my particular question of how to pass variable names to be used by dplyr::group_by()
within purrr::map()
.
Thanks in advance for any help.
You were almost there but you need to convert the variable name into a symbol to use with group_by()
. Note that in the code below count()
is a shortcut for group_by()
+ summarise(n = n())
.
library(dplyr)
library(purrr)
vars <- c("gender", "educ", "ethnic", "region")
vars %>%
map(~ census_match_input %>%
count(!!sym(.x)) %>%
rename(demo_cat = !!.x) %>%
mutate(demo_var = .x) %>%
relocate(demo_var))
[[1]]
# A tibble: 2 x 3
demo_var demo_cat n
<chr> <chr> <int>
1 gender female 524
2 gender male 476
[[2]]
# A tibble: 4 x 3
demo_var demo_cat n
<chr> <chr> <int>
1 educ BA_plus 311
2 educ HS_grad 247
3 educ no_HS 133
4 educ some_college 309
[[3]]
# A tibble: 5 x 3
demo_var demo_cat n
<chr> <chr> <int>
1 ethnic asian 48
2 ethnic black 146
3 ethnic hispanic 252
4 ethnic other 64
5 ethnic white 490
[[4]]
# A tibble: 4 x 3
demo_var demo_cat n
<chr> <chr> <int>
1 region midwest 218
2 region northeast 173
3 region south 367
4 region west 242
You could reshape the dataset using pivot_longer
then count
library(tidyverse)
census_match_input %>%
pivot_longer(all_of(var_order_census_match), "demo_var", values_to = "demo_cat") %>%
count(demo_var, demo_cat)
# A tibble: 15 x 3
demo_var demo_cat n
<chr> <chr> <int>
1 educ BA_plus 311
2 educ HS_grad 247
3 educ no_HS 133
4 educ some_college 309
5 ethnic asian 48
6 ethnic black 146
7 ethnic hispanic 252
8 ethnic other 64
9 ethnic white 490
10 gender female 524
11 gender male 476
12 region midwest 218
13 region northeast 173
14 region south 367
15 region west 242
You can also do this without non-standard evaluation keeping column names as character.
library(dplyr)
var_order_census_match <- c("gender", "educ", "ethnic", "region")
purrr::map(var_order_census_match,
~census_match_input %>%
group_by_at(.x) %>%
summarise(n = n()) %>%
rename(demo_cat = .x) %>%
mutate(demo_var = .x) %>%
relocate(demo_var))
#[[1]]
# A tibble: 2 x 3
# demo_var demo_cat n
# <chr> <chr> <int>
#1 gender female 524
#2 gender male 476
#[[2]]
# A tibble: 4 x 3
# demo_var demo_cat n
# <chr> <chr> <int>
#1 educ BA_plus 311
#2 educ HS_grad 247
#3 educ no_HS 133
#4 educ some_college 309
#....
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.