简体   繁体   中英

dodge columns in ggplot2

I am trying to create a picture that summarises my data. Data is about prevalence of drug use obtained from different practices form different countries. Each practice has contributed with a different amount of data and I want to show all of this in my picture.

Here is a subset of the data to work on:

gr<-data.frame(matrix(0,36))
gr$drug<-c("a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b")
gr$practice<-c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r")
gr$country<-c("c1","c1","c1","c1","c1","c1","c1","c1","c1","c1","c2","c2","c2","c2","c2","c2","c3","c3","c1","c1","c1","c1","c1","c1","c1","c1","c1","c1","c2","c2","c2","c2","c2","c2","c3","c3")
gr$prevalence<-c(9.14,5.53,16.74,1.93,8.51,14.96,18.90,11.18,15.00,20.10,24.56,22.29,19.41,20.25,25.01,25.87,29.33,20.76,18.94,24.60,26.51,13.37,23.84,21.82,23.69,20.56,30.53,16.66,28.71,23.83,21.16,24.66,26.42,27.38,32.46,25.34)
gr$prop<-c(0.027,0.023,0.002,0.500,0.011,0.185,0.097,0.067,0.066,0.023,0.433,0.117,0.053,0.199,0.098,0.100,0.594,0.406,0.027,0.023,0.002,0.500,0.011,0.185,0.097,0.067,0.066,0.023,0.433,0.117,0.053,0.199,0.098,0.100,0.594,0.406)
gr$low.CI<-c(8.27,4.80,12.35,1.83,7.22,14.53,18.25,10.56,14.28,18.76,24.25,21.72,18.62,19.83,24.36,25.22,28.80,20.20,17.73,23.15,21.06,13.12,21.79,21.32,22.99,19.76,29.60,15.41,28.39,23.25,20.34,24.20,25.76,26.72,31.92,24.73)
gr$high.CI<-c(10.10,6.37,22.31,2.04,10.00,15.40,19.56,11.83,15.74,21.52,24.87,22.86,20.23,20.68,25.67,26.53,29.86,21.34,20.21,26.10,32.79,13.63,26.02,22.33,24.41,21.39,31.48,17.98,29.04,24.43,22.01,25.12,27.09,28.05,33.01,25.95)

The code I wrote is this

p<-ggplot(data=gr, aes(x=factor(drug), y=as.numeric(gr$prevalence), ymax=max(high.CI),position="dodge",fill=practice,width=prop))
colour<-c(rep("gray79",10),rep("gray60",6),rep("gray39",2))
p + theme_bw()+
  geom_bar(stat="identity",position = position_dodge(0.9)) +
  labs(x="Drug",y="Prevalence") + 
  geom_errorbar(ymax=gr$high.CI,ymin=gr$low.CI,position=position_dodge(0.9),width=0.25,size=0.25,colour="black",aes(x=factor(drug), y=as.numeric(gr$prevalence), fill=practice)) +
  ggtitle("Drug usage by country and practice") +
  scale_fill_manual(values = colour)+ guides(fill=F)

The figure I obtain is this one where bars are all on top of each other while I want them "dodge".

在此处输入图片说明

I also obtain the following warning:

ymax not defined: adjusting position using y instead Warning message: position_dodge requires non-overlapping x intervals

Ideally I would get each bar near one another, with their error bars in the middle of its bar, all organised by country.

Also should I be concerned about the warning (which I clearly do not fully understand)?

I hope this makes sense. I hope I am close enough, but I don't seem to be going anywhere, some help would be greatly appreciated.

Thank you

ggplot's geom_bar() accepts the width parameter, but doesn't line them up neatly against one another in dodged position by default. The following workaround references the solution here :

library(dplyr)

# calculate x-axis position for bars of varying width
gr <- gr %>%
  group_by(drug) %>%
  arrange(practice) %>%
  mutate(pos = 0.5 * (cumsum(prop) + cumsum(c(0, prop[-length(prop)])))) %>%
  ungroup()

x.labels <- gr$practice[gr$drug == "a"]
x.pos <- gr$pos[gr$drug == "a"]

ggplot(gr,
       aes(x = pos, y = prevalence, 
           fill = country, width = prop,
           ymin = low.CI, ymax = high.CI)) +
  geom_col(col = "black") +
  geom_errorbar(size = 0.25, colour = "black") +
  facet_wrap(~drug) +
  scale_fill_manual(values = c("c1" = "gray79",
                               "c2" = "gray60",
                               "c3" = "gray39"),
                    guide = F) +
  scale_x_continuous(name = "Drug",
                     labels = x.labels,
                     breaks = x.pos) +
  labs(title = "Drug usage by country and practice", y = "Prevalence") +
  theme_classic()

情节

There is a lot of information you are trying to convey here - to contrast drug A and drug B across countries using the barplots and accounting for proportions, you might use the facet_grid function. Try this:

      colour<-c(rep("gray79",10),rep("gray60",6),rep("gray39",2))




      gr$drug <- paste("Drug", gr$drug)
      p<-ggplot(data=gr, aes(x=factor(practice), y=as.numeric(prevalence), 
                             ymax=high.CI,ymin = low.CI, 
                             position="dodge",fill=practice, width=prop))


        p + theme_bw()+ facet_grid(drug~country, scales="free")  +
        geom_bar(stat="identity") +
        labs(x="Practice",y="Prevalence") + 
        geom_errorbar(position=position_dodge(0.9), width=0.25,size=0.25,colour="black") +
        ggtitle("Drug usage by country and practice") +
        scale_fill_manual(values = colour)+ guides(fill=F)

在此处输入图片说明

The width is too small in the C1 country and as you indicated the one clinic is quite influential.

Also, you can specify your aesthetics with the ggplot(aes(...)) and not have to reset it and it is not needed to include the dataframe objects name in the aes function within the ggplot call.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM