简体   繁体   English

从R中使用ggplot2制作的多个箱图中完全删除异常值,并以展开格式显示箱图

[英]Remove outliers fully from multiple boxplots made with ggplot2 in R and display the boxplots in expanded format

I have some data here [in a .txt file] which I read into a data frame df, 我在这里 [在.txt文件中]有一些数据,我读入数据帧df,

df <- read.table("data.txt", header=T,sep="\t")

I remove the negative values in the column x (since I need only positive values) of the df using the following code, 我除去负值在列x (因为我只需要正的值)的的df使用以下代码,

yp <- subset(df, x>0)

Now I want plot multiple box plots in the same layer. 现在我想在同一层中绘制多个箱形图。 I first melt the data frame df , and the plot which results contains several outliers as shown below. 我首先融合数据框df ,结果图包含几个异常值,如下所示。

# Melting data frame df    
df_mlt <-melt(df, id=names(df)[1])
    # plotting the boxplots
    plt_wool <- ggplot(subset(df_mlt, value > 0), aes(x=ID1,y=value)) + 
      geom_boxplot(aes(color=factor(ID1))) +
      scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(10^.x))) +    
      theme_bw() +
      theme(legend.text=element_text(size=14), legend.title=element_text(size=14))+
      theme(axis.text=element_text(size=20)) +
      theme(axis.title=element_text(size=20,face="bold")) +
      labs(x = "x", y = "y",colour="legend" ) +
      annotation_logticks(sides = "rl") +
      theme(panel.grid.minor = element_blank()) +
      guides(title.hjust=0.5) +
      theme(plot.margin=unit(c(0,1,0,0),"mm")) 
    plt_wool

Boxplot与异常值

Now I need to have a plot without any outliers, so to do this first I compute the lower and upper bound whiskers I use the following code as suggested here , 现在,我需要有一个情节,没有任何异常,所以这样做首先我计算上限和下限胡须我使用下面的代码的建议在这里

sts <- boxplot.stats(yp$x)$stats

To remove the outlier I add the upper and lower whisker limits as below, 为了消除异常值,我添加了上下晶须限制,如下所示,

p1 = plt_wool + coord_cartesian(ylim = c(sts*1.05,sts/1.05))

The resulting plot is shown below, while the above line of code correctly removes most of the top outliers all the bottom outliers still remain. 结果图如下所示,而上面的代码行正确地删除了大部分顶部异常值,所有底部异常值仍然存在。 Could someone please suggest how to remove all the outlier completely from this plot, Thanks. 有人可以建议如何从这个情节中完全删除所有异常值,谢谢。

在此输入图像描述

A minimal reproducible example: 一个可重复性最小的例子:

library(ggplot2)
p <- ggplot(mtcars, aes(factor(cyl), mpg))
p + geom_boxplot()

Not plotting outliers: 没有绘制异常值:

p + geom_boxplot(outlier.shape=NA)
#Warning message:
#Removed 3 rows containing missing values (geom_point).

(I prefer to get this warning, because a year from now with a long script it would remind me that I did something special there. If you want to avoid it use Sven's solution.) (我更喜欢得到这个警告,因为一年后我会用长脚本提醒我,我在那里做了一些特别的事情。如果你想避免使用Sven的解决方案。)

Based on suggestions by @Sven Hohenstein, @Roland and @lukeA I have solved the problem for displaying multiple boxplots in expanded form without outliers. 根据@Sven Hohenstein,@ Roland和@lukeA的建议,我解决了以扩展形式显示多个箱图而没有异常值的问题。

First plot the box plots without outliers by using outlier.colour=NA in geom_boxplot() 首先在geom_boxplot()使用outlier.colour=NA绘制没有异常值的箱形图

plt_wool <- ggplot(subset(df_mlt, value > 0), aes(x=ID1,y=value)) + 
  geom_boxplot(aes(color=factor(ID1)),outlier.colour = NA) +
  scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(10^.x))) +
  theme_bw() +
  theme(legend.text=element_text(size=14), legend.title=element_text(size=14))+
  theme(axis.text=element_text(size=20)) +
  theme(axis.title=element_text(size=20,face="bold")) +
  labs(x = "x", y = "y",colour="legend" ) +
  annotation_logticks(sides = "rl") +
  theme(panel.grid.minor = element_blank()) +
  guides(title.hjust=0.5) +
  theme(plot.margin=unit(c(0,1,0,0),"mm"))

Then compute the lower, upper whiskers using boxplot.stats() as the code below. 然后使用boxplot.stats()作为下面的代码计算较低的上胡须。 Since I only take into account positive values, I choose them using the condition in the subset() . 由于我只考虑正值,因此我使用subset()的条件选择它们。

yp <- subset(df, x>0)             # Choosing only +ve values in col x
sts <- boxplot.stats(yp$x)$stats  # Compute lower and upper whisker limits

Now to achieve full expanded view of the multiple boxplots, it is useful to modify the y-axis limit of the plot inside coord_cartesian() function as below, 现在要实现多个箱图的完全展开视图,修改coord_cartesian()函数内的图的y轴限制很有用,如下所示,

p1 = plt_wool + coord_cartesian(ylim = c(sts[2]/2,max(sts)*1.05))

Note: The limits of y should be adjusted according to the specific case. 注意: y的限制应根据具体情况进行调整。 In this case I have chosen half of lower whisker limit for ymin. 在这种情况下,我选择ymin的一半较低的晶须限制。

The resulting plot is below, 结果图如下,

您可以使用参数outlier.colour = NA使异常值不可见:

geom_boxplot(aes(color = factor(ID1)), outlier.colour = NA)
ggplot(df_mlt, aes(x = ID1, y = value)) + 
  geom_boxplot(outlier.size = NA) + 
  coord_cartesian(ylim = range(boxplot(df_mlt$value, plot=FALSE)$stats)*c(.9, 1.1))

Another way to exclude outliers is to calculate them then set the y-limit on what you consider an outlier. 排除异常值的另一种方法是计算它们,然后根据您认为的异常值设置y限制。

For example, if your upper and lower limits are Q3 + 1.5 IQR and Q1 - 1.5 IQR , then you may use: 例如,如果您的上限和下限是Q3 + 1.5 IQRQ1 - 1.5 IQR ,那么您可以使用:

upper.limit <- quantile(x)[4] + 1.5*IQR(x)
lower.limit <- quantile(x)[2] - 1.5*IQR(x)

Then put limits on the y-axis range: 然后对y轴范围设置限制:

ggplot + coord_cartesian(ylim=c(lower.limit, upper.limit))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM