简体   繁体   中英

ggplot scale_fill_manual by finding like/wildcard values

I am looping a function to calculate the races for all students in all courses on my campus, then aggregating the races and the grades they received in the course to create a grade distribution. I am also pasting the race sum to the race (eg, African American (192)) to indicate the number of students in each racial group. I am then plotting that data for each course (600+ courses, hence the loop). My problem is that when I assign scale_fill_manual colors to each racial category it fails because the racial category changes from one iteration to the next: in the first it might be African American (192) and in the second in might be African American (87) so I can't select the scale_fill_manual value...that is, I cant code

scale_fill_manual(values = c("African American"="violetred1","Asian"="orange3)

because the name for each racial group is constantly changing. so, my question is, is there a way to, much like SQL, apply a wildcard to the value...something like:

scale_fill_manual(values = c("African American*"="violetred1","Asian*"="orange3) 

Or perhaps there is a better way to do this?

edit: i have columns for race, count, and racecount looking like:

African American, 192, African American (192)

so if there were a way to fill with racecount so the legend label for each group is Race (count) but then assign scale_fill_manual to the race column where the groups remain the same, that could work but I don't know how to make that happen.

Here is a reproducible example:

library(tidyverse)
library(extdplyr)
library(pacman)
p_load_gh("trinker/wakefield")
set.seed(10)

df1<-dplyr::data_frame(
  ID = wakefield::id(n=100), 
  Race = race(n=100),
  Course = group(n=100),
  Grade =sample(1:5,100,replace=T))



df1

courselist=list("Treatment","Control")


myplot<-function(coursegrade){

  coursegrade<-as.character(coursegrade)
  subject<-df1%>%filter(Course==coursegrade)
  percents<- pct_routine(subject, Race, Grade)
  dat2 = subject %>%
    group_by(Race) %>%
    summarise(Count = n())
  percents<-inner_join(percents, dat2, by = "Race") 
  percents$Count <- with(percents, paste0("(", Count, ")"))
  percents$Race.Eth <- paste(percents$Race, percents$Count)
  percents$pct<-percents$pct*100

  temp_plot=ggplot(percents,aes(fill=Race.Eth, y=pct, x=Grade)) + 
    geom_bar(position="dodge", stat="identity", colour="black", width = .8) +
    ggtitle("Grade Distributions by Race, 2015 - 2018", subtitle = coursegrade) + 
    theme(plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = .5)) +
    scale_y_continuous(limits=c(0,70)) 



  ggsave(temp_plot, file=paste0(coursegrade," - grade distribution.jpg"), width = 13, height = 7, units = "in")
  print(temp_plot)
}

lapply(courselist,myplot)

You may be able to avoid this issue by adding the sum to the label in your ggplot code only where you really need it. For instance, suppose that you only use it in the title of your graph, then keep the label to "African American" throughout (so you can match it to its color) and use labs(title = paste0(my_label, " (", my_count, ")")) , where my_label would correspond to "African American" and my_count to the count.

As @user2362777 mentioned, it is best to not perform this labeling within the ggplot code chunks. Consider creating a new column, or editing the original column, for "race" before feeding into gg.

Your options include:

There are other posts similar to this on SO: https://stackoverflow.com/search?q=%5Br%5D+partial+string+match+replace

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM