How to summarise (dplyr) user specified variables reactively in flexdashboard/shiny?

Question

I am trying to develop a shiny dashboard app that is able to produce a bar graph for different outcome variables that can be selected by the user. To do so, I need to subset my data reactively to generate aggregate data frames. I am able to have the code below successfully filter my data reactively, but I am running into trouble when I try to use dplyr::summarise() reactively.

Here is my data

dput(head(df))

structure(
    list(
      geoid = c(
        "01001020200",
        "01001020300",
        "01001020700",
        "01001020802",
        "01001021000",
        "01001021100"
      ),
      state = c(
        "Alabama",
        "Alabama",
        "Alabama",
        "Alabama",
        "Alabama",
        "Alabama"
      ),
      county = c(
        "Autauga County",
        "Autauga County",
        "Autauga County",
        "Autauga County",
        "Autauga County",
        "Autauga County"
      ),
      ozzone = structure(
        c(1L, 1L, 2L, 1L, 1L, 1L),
        .Label = c("non.oz", "oz"),
        class = "factor"
      ),
      tract_type = c(
        "LICs",
        "Contiguous",
        "LICs",
        "Contiguous",
        "Contiguous",
        "LICs"
      ),
      investment_score_1_low_10_high = c(4,
                                         6, 9, 10, 5, 6),
      socioeconomic_change_flag_1_yes_blank_no = c(0,
                                                   0, 0, 0, 0, 0),
      fips_county = c("01001", "01001", "01001", "01001",
                      "01001", "01001"),
      total_empl = c(51809L, 51809L, 51809L, 51809L,
                     51809L, 51809L),
      total_payroll = c(338395L, 338395L, 338395L,
                        338395L, 338395L, 338395L),
      total_establishments = c(5090L, 5090L,
                               5090L, 5090L, 5090L, 5090L),
      largest_employer = c(72L, 72L, 72L,
                           72L, 72L, 72L),
      largest_employer_bypayroll = c(44L, 44L, 44L,
                                     44L, 44L, 44L),
      trend_employee_change = c(
        2735.60000000046,
        2735.60000000046,
        2735.60000000046,
        2735.60000000046,
        2735.60000000046,
        2735.60000000046
      ),
      trend_payroll_change = c(
        23074.8000000037,
        23074.8000000037,
        23074.8000000037,
        23074.8000000037,
        23074.8000000037,
        23074.8000000037
      ),
      trend_establishment_change = c(
        53.4000000000084,
        53.4000000000084,
        53.4000000000084,
        53.4000000000084,
        53.4000000000084,
        53.4000000000084
      ),
      damage_cost_weather_total = c(20000, 20000, 20000, 20000,
                                    20000, 20000),
      deaths_weather_total = c(0L, 0L, 0L, 0L, 0L, 0L),
      medianrent = c(537, 633, 525, 680, 409, 303),
      vacancyrate = c(
        0.108200455580866,
        0.113652113652114,
        0.0436681222707424,
        0.0512166859791425,
        0.229962546816479,
        0.21030303030303
      ),
      total_pop = c(503, 827, 900, 2989, 740, 813),
      undertwo_percent = c(
        0.391650099403579,
        0.351874244256348,
        0.397777777777778,
        0.17096018735363,
        0.301351351351351,
        0.263222632226322
      ),
      mobility_rate = c(
        0.133702166897188,
        0.0737753882915173,
        0.196514423076923,
        0.172716680111141,
        0.0641304347826087,
        0.0681084570690769
      ),
      unemploy_rate = c(
        0.0176991150442478,
        0.0273203592814371,
        0.109881724532621,
        0.0127906976744186,
        0.0344982078853047,
        0.0281910728269381
      ),
      median_income = c(41287, 46806, 41250, 64439,
                        46607, 36450),
      renter_percent = c(
        0.337653478854025,
        0.310596310596311,
        0.331877729257642,
        0.268110942458949,
        0.328686327077748,
        0.365986394557823
      ),
      blackaa_percent = c(
        0.5451197053407,
        0.264697193500739,
        0.145906432748538,
        0.152916262243007,
        0.258583690987124,
        0.530922930542341
      ),
      hispanic_percent = c(
        0.0105893186003683,
        0.0803545051698671,
        0.0400584795321637,
        0.0137651107385511,
        0.00822603719599428,
        0.00666032350142721
      ),
      transit_score_mean = c(0, 0, 0, 0, 0, 0),
      life_expectancy = c(75.67, 75.67, 75.67, 75.67, 75.67, 75.67),
      trend_life_expectancy = c(5.1, 5.1, 5.1, 5.1, 5.1, 5.1),
      median_monthly_housing_costs = c(885,
                                       885, 885, 885, 885, 885),
      pestilence_2018 = c(2, 2, 2, 2, 2,
                          2),
      total_pop_county = c(6772, 6772, 6772, 6772, 6772, 6772),
      deaths_weather_pop = c(0, 0, 0, 0, 0, 0),
      cost_weather_pop = c(
        2.95333727111636,
        2.95333727111636,
        2.95333727111636,
        2.95333727111636,
        2.95333727111636,
        2.95333727111636
      ),
      Male_HSgrad = c(75, 68, 211, 189, 97,
                      42),
      Male_SomeCollege = c(28, 18, 51, 111, 74, 38),
      Male_AssocDeg = c(4,
                        6, 0, 63, 0, 21),
      Male_BachDeg = c(7, 9, 0, 11, 0, 9),
      Male_GradDeg = c(0,
                       0, 0, 29, 6, 0),
      MaleEduAboveHS = c(114, 101, 262, 403, 177,
                         110),
      Total_Male18.24 = c(145, 123, 285, 455, 202, 110),
      MaleEduHSAbove_pop = c(
        0.786206896551724,
        0.821138211382114,
        0.919298245614035,
        0.885714285714286,
        0.876237623762376,
        1
      ),
      Female_HSgrad = c(11, 60, 87, 156, 23, 83),
      Female_SomeCollege = c(22,
                             25, 13, 47, 54, 65),
      Female_AssocDeg = c(0, 0, 20, 82, 0,
                          0),
      Female_BachDeg = c(5, 26, 0, 19, 0, 11),
      Female_GradDeg = c(5,
                         16, 0, 0, 0, 0),
      FemaleEduAboveHS = c(43, 127, 120, 304,
                           77, 159),
      Total_Female18.24 = c(53, 127, 192, 581, 92, 198),
      FemaleEduHSAbove_pop = c(
        0.811320754716981,
        1,
        0.625,
        0.523235800344234,
        0.83695652173913,
        0.803030303030303
      )
    ),

    row.names = c(NA,
                  6L),
    class = "data.frame"
  )

Here is my code


#List of potential outcome variables to be plotted
variables <- c("total_empl", "total_payroll", "total_establishments", "largest_employer", "largest_employer_bypayroll", "trend_employee_change", "trend_payroll_change", "trend_establishment_change", "damage_cost_weather_total", "deaths_weather_total", "medianrent", "vacancyrate", "total_pop", "undertwo_percent", "mobility_rate", "unemploy_rate", "median_income", "renter_percent", "blackaa_percent", "hispanic_percent", "median_monthly_housing_costs", "MaleEduAboveHS_pop", "FemaleEduHSAbove_pop")



# Define inputs
selectInput('state_name', label = 'Select a state', choices = lookup)

selectInput('DV', label = 'Outcome Measure', choices = variables)

#Filter data based on the State and outcome measure the user would like to investigate.

bar <- reactive({

  st <- df %>%
        filter(state == input$state_name) 

  bp <- st %>%
        group_by(tract_type) %>%
        summarise(Outcome = mean(st[,input$DV]))

  return(bp)
})

bar

UPDATE Right now, this code successfully filters the data by the input$state_name , but there is an issue with the calculation of means. The result is this:

# A tibble: 2 x 2
  tract_type Outcome
  <chr>        <dbl>
1 Contiguous   468296.
2 LICs         468296.

As you can see, the means that are calculated are identical. In fact, these values correspond to the grand average mean for whichever variable is chosen for input$DV . Therefore, the filtered st data is not being successfully grouped into the two levels of tract_type .

Answer 1

I see what you are trying to do. The difference is that in your reactive part you try to calculate the mean of a string, which won't work. What you want to do is summarise one of the columns in df by providing the name

In the following example, I specify the summarising variable manually. Note that investment_score_1_low_10_high does not have quotes. investment_score_1_low_10_high is what is called a symbol in R.

st <- df %>%
  filter(state == "Alabama") %>% 
  group_by(tract_type) %>%
  summarise(Outcome = mean(investment_score_1_low_10_high))

But I think this should work:

bar <- reactive({
  # Create a symbol from string.
  mean_variable <- sym(input$DV)
  bp <- df %>%
        filter(state == input$state_name) %>%
        group_by(tract_type) %>%
        summarise(Outcome = mean(!! mean_variable, na.rm = TRUE))

  return(bp)
})

Extra information about the use of !! and what it does can be found here: Here

And even better with examples Here

Answer 2

Solution derived by @dylanvanw

bar <- reactive({
  # Create a symbol from string.
  mean_variable <- sym(input$DV)
  bp <- df %>%
        filter(state == input$state_name) %>%
        group_by(tract_type) %>%
        summarise(Outcome = mean(!! mean_variable, na.rm = TRUE))

  return(bp)
})

How to summarise (dplyr) user specified variables reactively in flexdashboard/shiny?

Question

2 answers

solution1
1 ACCPTED 2019-05-30 20:27:25

solution2
0 2019-06-03 13:56:05

Solution derived by @dylanvanw

How to summarise (dplyr) user specified variables reactively in flexdashboard/shiny?

Question

2 answers

solution1 1 ACCPTED 2019-05-30 20:27:25

solution2 0 2019-06-03 13:56:05

Solution derived by @dylanvanw

solution1
1 ACCPTED 2019-05-30 20:27:25

solution2
0 2019-06-03 13:56:05