简体   繁体   中英

How can I make geom_boxplot outliers “line up” with jittered geom_points?

How can I make geom_boxplot outliers overlay perfectly with jittered geom_points?

For example, I want the outliers from geom_boxplot to be displayed as "cross hairs" over their actual points from geom_point after jittering?

library(ggplot2)
p <- ggplot(mtcars, aes(factor(cyl), mpg)) + 
  geom_boxplot(outlier.shape=10, outlier.size=8)  +
  geom_point(aes(factor(cyl), mpg, color=mpg),  position="jitter", size=4)
p

情节

Any assistance would be greatly appreciated.

I agree with Didzis that a solution that does exactly what you aim for is going to be fairly involved. To literally do what you suggest would require (I think) that you do both the jittering and the outlier calculation outside of ggplot. If you're flexible about how you highlight the outliers, this is a potentially shorter solution:

id_outliers <- function(x){
    q <- quantile(x,c(0.25,0.75))
    iqr <- abs(diff(q))
    ifelse((x < q[1] - 1.5*iqr) | (x > q[2] + 1.5*iqr),'Outlier','NotOutlier')
}

mtcars <- ddply(mtcars,
                .(cyl),
                transform,
                out = id_outliers(mpg))

p <- ggplot(mtcars, aes(factor(cyl), mpg)) + 
  geom_boxplot(outlier.colour = NA)  + 
  geom_point(aes(colour = mpg,shape = out),position = "jitter")

This solution will be quite long. Problem is that with position="jitter" you can't get exact coordinates for points, so need to find workaround.

So take your original plot and save it with ggplot_build() . First element of data contains information about boxplots. We are interested in column group and outliers as it shows which values ggplot assumes as outliers. Save them as separate object.

p <- ggplot(mtcars, aes(factor(cyl), mpg)) + 
                geom_boxplot(outlier.shape=10, outlier.size=8)  +
                geom_point(aes(color=mpg),  position="jitter", size=4)
gg<-ggplot_build(p)
gg$data[[1]]
  ymin lower middle upper ymax         outliers notchupper notchlower x PANEL group weight ymin_final
1 21.4 22.80   26.0 30.40 33.9                    29.62055   22.37945 1     1     1      1       21.4
2 17.8 18.65   19.7 21.00 21.4                    21.10338   18.29662 2     1     2      1       17.8
3 13.3 14.40   15.2 16.25 18.7 10.4, 10.4, 19.2   15.98120   14.41880 3     1     3      1       10.4
  ymax_final  xmin  xmax
1       33.9 0.625 1.375
2       21.4 1.625 2.375
3       19.2 2.625 3.375

xx<-gg$data[[1]][c("group","outliers")]
xx
  group         outliers
1     1                 
2     2                 
3     3 10.4, 10.4, 19.2

Now change group values to 4,6 and 8 to be the same as cyl values.

xx$group<-c(4,6,8)

Now merge this new data frame with original mtcars and save as new data frame. Then apply function to check if particulars mpg value is listed in outliers for that cyl level. Those values (TRUE and FALSE) are saved in column out .

mtcars.new<-merge(mtcars,xx,by.x="cyl",by.y="group")
mtcars.new$out<-apply(mtcars.new,1,function(x) x$mpg %in% x$outliers)

Use new data frame to plot data. Remove outliers form geom_boxplot() . Use column out to determine shape and size of points. With scale_shape_manual() and scale_size_manual() adjust appearance.

ggplot(mtcars.new, aes(factor(cyl), mpg)) + 
          geom_boxplot(outlier.shape = NA)  +
          geom_point(aes(color=mpg,shape=out,size=out),  position="jitter")+
          scale_shape_manual(values=c(16,10),guide="none")+
          scale_size_manual(values=c(4,8),guide="none")

在此输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM