简体   繁体   English

Romove异常值来自ggplot2中的stat_summary

[英]Romove outliers from stat_summary in ggplot2

I have this part of code to produce boxplot with my data:我有这部分代码可以用我的数据生成箱线图:

p <- ggplot(meltData, aes(x=variable, y=value)) + 
  geom_boxplot()+  geom_boxplot(outlier.colour="red", outlier.shape=1,outlier.size=2)+
stat_summary(geom="text", fun=quantile,
             aes(label=sprintf("%1.1f", ..y..), color=factor(variable)),
             position=position_nudge(x=0.0), size=3.5,show_guide = FALSE)+
  ggtitle("Species measurements")+
  ggeasy::easy_center_title()
p

and I have this output:我有这个输出: 在此处输入图像描述

I want to be able to see uppper and lower whisker numbers on my boxplot as maximum and minimum values (and not the outliers numbers).我希望能够在我的箱线图上看到上下须数作为最大值和最小值(而不是异常值)。 For example, on the 5th boxplot we can see that the max number is 72, but this is an outlier and the max should be at 56 approximately.例如,在第 5 个箱线图中,我们可以看到最大数字为 72,但这是一个异常值,最大值应约为 56。

If I understand your purpose correctly, you want to create boxplots along with texts that show the upper and lower whisker numbers and no outliers should be shown in the plots.如果我正确理解您的目的,您希望创建箱线图以及显示上下胡须编号的文本,并且图中不应显示异常值。 If that's true, then I agree with @Death Metal that you might want to filter the outliers per category.如果这是真的,那么我同意@Death Metal 的观点,您可能想要过滤每个类别的异常值。

However, because you don't provide a reproducible data, here is a dummy data similar to your data.但是,由于您不提供可重现的数据,因此这里有一个与您的数据类似的虚拟数据。

dat <- data.frame(var.A = c(iris$Sepal.Length, c(20,21,22)), 
                  var.B = c(iris$Petal.Length, c(20,21,22)))
meltData <- dat %>% pivot_longer(cols = c(var.A, var.B), 
                                 values_to = "value", 
                                 names_to = "variable")

ggplot(meltData, aes(x=variable, y=value)) + geom_boxplot()

which clearly shows outliers这清楚地显示了异常值

在此处输入图像描述

Here is on of the ways to filter the outliers before applying boxplots:以下是在应用箱线图之前过滤异常值的方法:

meltData %>% group_by(variable) %>%
     filter(value != (boxplot(value))$out) %>% 
     ggplot(aes(x = variable, y = value)) + 
     geom_boxplot() + stat_summary(geom="text", 
                                   fun=quantile,aes(label=sprintf("%1.1f", ..y..), 
                                                    color=factor(variable)),
                                   position=position_nudge(x=0.0), 
                                   size=3.5,show_guide = FALSE)+
     ggtitle("Species measurements")+
     ggeasy::easy_center_title()
#Warning message:
#`show_guide` has been deprecated. Please use `show.legend` instead. 

The result:结果:

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM