简体   繁体   中英

Overlay normal desnity curves in R using ggplot

I'm trying to overlay normal density curves over my stacked histograms in R using ggplot . bsa are numerical measures and they are recorded for two groups, treatment and control.

I have created stacked histograms for the two groups. I get an error with stat_function about the mapping needing to be a list of unevaluated mappings.

Any advice on how to do this would be appreciated.

ggplot(data=bsa, aes(x=bsa)) +geom_histogram(colours(distinct=TRUE)) + facet_grid(group~.) +
  stat_function(dnorm(x, mean(bsa$bsa),sd(bsa$bsa)))+
  ggtitle("Histogram of BSA amounts by group")  

Using stat_function(...) with facets is tricky. stat_function(...) takes an argument args=... which needs to be a named list of the extra arguments to the function (so in your case, mean and sd ). The problem is that these cannot appear in aes(...) so you have to add the curves manually. Here is an example.

set.seed(1)   # for reproducible example
df <- data.frame(bsa=rnorm(200, mean=rep(c(1,4),each=100)), 
                 group=rep(c("test","control"),each=100))
# calculate mean and sd by group
stats <- aggregate(bsa~group, df, function(x) c(mean=mean(x), sd=sd(x)))
stats <- data.frame(group=stats[,1],stats[,2])

library(ggplot2)
ggplot(df, aes(x=bsa)) +
  geom_histogram(aes(y=..density..,fill=group), color="grey30")+
  with(stats[stats$group=="control",],stat_function(data=df[df$group=="control",],fun=dnorm, args=list(mean=mean, sd=sd)))+
  with(stats[stats$group=="test",],stat_function(data=df[df$group=="test",],fun=dnorm, args=list(mean=mean, sd=sd)))+
  facet_grid(group~.)

This is rather ugly, so I usually just calculae the curves external to ggplot and add them using geom_line(...) .

x <- with(df, seq(min(bsa), max(bsa), len=100))
dfn <- do.call(rbind,lapply(1:nrow(stats), 
                            function(i) with(stats[i,],data.frame(group, x, y=dnorm(x,mean=mean,sd=sd)))))
ggplot(df, aes(x=bsa)) +
  geom_histogram(aes(y=..density..,fill=group), color="grey30")+
  geom_line(data=dfn, aes(x, y))+
  facet_grid(group~.)

This makes the ggplot code much more readable and produces pretty much the same thing.

Notice that if you wanted to overlay a kernel density estimate, rather than a normal curve, this would be a lot easier:

ggplot(df, aes(x=bsa)) +
  geom_histogram(aes(y=..density..,fill=group), color="grey30")+
  stat_density(geom="line")+
  facet_grid(group~.)

在此输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM