[英]No outliers in ggplot boxplot with facet_wrap
I would like to plot boxplots without outliers with ggplot, giving focus on the boxes and whiskers only 我想用ggplot绘制没有异常值的箱形图,只关注盒子和胡须
For example: 例如:
p1 <- ggplot(diamonds, aes(x=cut, y=price, fill=cut))
p1 + geom_boxplot() + facet_wrap(~clarity, scales="free")
gives facetted boxplots with outliers 给出带有异常值的刻面箱图
I can suppress outliers with outlier.size=NA: 我可以用outlier.size = NA来抑制异常值:
p1 <- ggplot(diamonds, aes(x=cut, y=price, fill=cut))
p1 + geom_boxplot(outlier.size=NA) + facet_wrap(~clarity, scales="free")
which gives 这使
Here, the y-axis scale is the same as in the original plot, just the outliers don't show up. 这里,y轴刻度与原始绘图中的相同,只是异常值不显示。 How can I now modify the scale to "zoom in" on each panel according to the whisker ends?
我现在如何根据晶须结束修改每个面板上的“放大”比例?
I can reset ylim like this 我可以像这样重置ylim
ylim1 = boxplot.stats(diamonds$price)$stats[c(1, 5)]
and then replot 然后重新绘制
p1 + geom_boxplot(outlier.size=NA)
+ facet_wrap(~clarity, scales="free")
+ coord_cartesian(ylim = ylim1*1.05)
but this doesn't work on the facets: 但这不适用于方面:
Is there a way to "facet_wrap" the boxplots.stats function? 有没有办法“facet_wrap”boxplots.stats功能?
Edit: 编辑:
I've tried to calculate the boxplot statistics dynamically, but this doesn't seem to work. 我试图动态计算箱线图统计数据,但这似乎不起作用。
give.stats <- function(x){return(boxplot.stats(x)$stats[c(1,5)])}
p1 + geom_boxplot(outlier.size=NA) +
facet_wrap(~clarity, scales="free") +
coord_cartesian(ylim = give.stats)
> Error in min(x, na.rm = na.rm) : invalid 'type' (list) of argument
Any more ideas would be much appreciated. 任何更多的想法将不胜感激。
Through outlier.size=NA
you make the outliers disappear, this is not an option to ignore the outliers plotting the boxplots. 通过
outlier.size=NA
你可以使异常值消失,这不是忽略绘制outlier.size=NA
的异常值的选项。 So, the plots are generated considering the (invisible) outliers. 因此,绘图是考虑到(不可见的)异常值而生成的。 There seems to be no option for what you want.
似乎没有你想要的选择。 In order to make the boxplots as you need them I would calculate the quantiles myself and generate the boxplots based on these quantiles, like in the following example:
为了根据需要制作箱形图,我会自己计算分位数,并根据这些分位数生成箱形图,如下例所示:
stat<-tapply(diamonds$price,list(diamonds$cut,diamonds$clarity),function(x) boxplot.stats(x))
stats<-unlist(tapply(diamonds$price,list(diamonds$cut,diamonds$clarity),function(x) boxplot.stats(x)$stats))
df<-data.frame(
cut=rep(rep(unlist(dimnames(stat)[1]),each=5),length(unlist(dimnames(stat)[2]))),
clarity=rep(unlist(dimnames(stat)[2]),each=25),
price=unlist(tapply(diamonds$price,list(diamonds$cut,diamonds$clarity),function(x) boxplot.stats(x)$stats)))
ggplot(df,aes(x=cut,y=price,fill=cut))+geom_boxplot()+facet_wrap(~clarity,scales="free")
Which gives (note that the orders in the plot are different now): 给出(注意图中的订单现在不同):
Ok, I figured out a more easy way to do this by commenting out some lines in the original ggplot boxplot function and calling the modified function. 好吧,通过在原始ggplot boxplot函数中注释掉一些行并调用修改后的函数,我想出了一个更简单的方法。
I am not a programmer, no idea if this is a good or robust thing to do, but it seems to work fine for now. 我不是一个程序员,不知道这是一个好的或强大的事情,但它似乎现在工作正常。
This is the modified function I am using: 这是我正在使用的修改功能:
#modified version of geom_boxplot
require(ggplot2)
geom_boxplot_noOutliers <- function (mapping = NULL, data = NULL, stat = "boxplot",
position = "dodge", outlier.colour = NULL,
outlier.shape = NULL, outlier.size = NULL,
notch = FALSE, notchwidth = .5, varwidth = FALSE,
...) {
#outlier_defaults <- ggplot2:::Geom$find('point')$default_aes()
#outlier.colour <- outlier.colour %||% outlier_defaults$colour
#outlier.shape <- outlier.shape %||% outlier_defaults$shape
#outlier.size <- outlier.size %||% outlier_defaults$size
GeomBoxplot_noOutliers$new(mapping = mapping, data = data, stat = stat,
position = position, outlier.colour = outlier.colour,
outlier.shape = outlier.shape, outlier.size = outlier.size, notch = notch,
notchwidth = notchwidth, varwidth = varwidth, ...)
}
GeomBoxplot_noOutliers <- proto(ggplot2:::Geom, {
objname <- "boxplot_noOutliers"
reparameterise <- function(., df, params) {
df$width <- df$width %||%
params$width %||% (resolution(df$x, FALSE) * 0.9)
# if (!is.null(df$outliers)) {
# suppressWarnings({
# out_min <- vapply(df$outliers, min, numeric(1))
# out_max <- vapply(df$outliers, max, numeric(1))
# })
#
# df$ymin_final <- pmin(out_min, df$ymin)
# df$ymax_final <- pmax(out_max, df$ymax)
# }
# if `varwidth` not requested or not available, don't use it
if (is.null(params) || is.null(params$varwidth) || !params$varwidth || is.null(df$relvarwidth)) {
df$xmin <- df$x - df$width / 2
df$xmax <- df$x + df$width / 2
} else {
# make `relvarwidth` relative to the size of the largest group
df$relvarwidth <- df$relvarwidth / max(df$relvarwidth)
df$xmin <- df$x - df$relvarwidth * df$width / 2
df$xmax <- df$x + df$relvarwidth * df$width / 2
}
df$width <- NULL
if (!is.null(df$relvarwidth)) df$relvarwidth <- NULL
df
}
draw <- function(., data, ..., fatten = 2, outlier.colour = NULL, outlier.shape = NULL, outlier.size = 2,
notch = FALSE, notchwidth = .5, varwidth = FALSE) {
common <- data.frame(
colour = data$colour,
size = data$size,
linetype = data$linetype,
fill = alpha(data$fill, data$alpha),
group = data$group,
stringsAsFactors = FALSE
)
whiskers <- data.frame(
x = data$x,
xend = data$x,
y = c(data$upper, data$lower),
yend = c(data$ymax, data$ymin),
alpha = NA,
common)
box <- data.frame(
xmin = data$xmin,
xmax = data$xmax,
ymin = data$lower,
y = data$middle,
ymax = data$upper,
ynotchlower = ifelse(notch, data$notchlower, NA),
ynotchupper = ifelse(notch, data$notchupper, NA),
notchwidth = notchwidth,
alpha = data$alpha,
common)
# if (!is.null(data$outliers) && length(data$outliers[[1]] >= 1)) {
# outliers <- data.frame(
# y = data$outliers[[1]],
# x = data$x[1],
# colour = outlier.colour %||% data$colour[1],
# shape = outlier.shape %||% data$shape[1],
# size = outlier.size %||% data$size[1],
# fill = NA,
# alpha = NA,
# stringsAsFactors = FALSE)
# outliers_grob <- GeomPoint$draw(outliers, ...)
# } else {
outliers_grob <- NULL
# }
ggname(.$my_name(), grobTree(
outliers_grob,
GeomSegment$draw(whiskers, ...),
GeomCrossbar$draw(box, fatten = fatten, ...)
))
}
guide_geom <- function(.) "boxplot_noOutliers"
draw_legend <- function(., data, ...) {
data <- aesdefaults(data, .$default_aes(), list(...))
gp <- with(data, gpar(col=colour, fill=alpha(fill, alpha), lwd=size * .pt, lty = linetype))
gTree(gp = gp, children = gList(
linesGrob(0.5, c(0.1, 0.25)),
linesGrob(0.5, c(0.75, 0.9)),
rectGrob(height=0.5, width=0.75),
linesGrob(c(0.125, 0.875), 0.5)
))
}
default_stat <- function(.) StatBoxplot
default_pos <- function(.) PositionDodge
default_aes <- function(.) aes(weight=1, colour="grey20", fill="white", size=0.5, alpha = NA, shape = 16, linetype = "solid")
required_aes <- c("x", "lower", "upper", "middle", "ymin", "ymax")
})
I saved it as an r file and use source
to load it: 我将其保存为r文件并使用
source
加载它:
library(ggplot2)
library(scales)
#load functions
source("D:/Eigene Dateien/Scripte/R-Scripte/myfunctions/geomBoxplot_noOutliers.r")
Now I can just plot without outliers using geom_boxplot_noOutliers
and everything works fine even with facets :-) 现在我可以使用
geom_boxplot_noOutliers
绘制没有异常值的情节,即使有方面也一切正常:-)
p1 <- ggplot(diamonds, aes(x=cut, y=price, fill=cut))
p1 + geom_boxplot_noOutliers() + facet_wrap(~clarity, scales="free")
It can be done with stat_summary and custom statistic calculation function: 可以使用stat_summary和自定义统计计算功能完成:
calc_boxplot_stat <- function(x) {
coef <- 1.5
n <- sum(!is.na(x))
# calculate quantiles
stats <- quantile(x, probs = c(0.0, 0.25, 0.5, 0.75, 1.0))
names(stats) <- c("ymin", "lower", "middle", "upper", "ymax")
iqr <- diff(stats[c(2, 4)])
# set whiskers
outliers <- x < (stats[2] - coef * iqr) | x > (stats[4] + coef * iqr)
if (any(outliers)) {
stats[c(1, 5)] <- range(c(stats[2:4], x[!outliers]), na.rm = TRUE)
}
return(stats)
}
ggplot(diamonds, aes(x=cut, y=price, fill=cut)) +
stat_summary(fun.data = calc_boxplot_stat, geom="boxplot") +
facet_wrap(~clarity, scales="free")
The stats calculation function is generic, thus no need for data manipulation before plotting. 统计计算功能是通用的,因此在绘图之前不需要数据操作。
It is also possible to set whiskers to 10% and 90% : 也可以将胡须设置为10%和90%:
calc_stat <- function(x) {
coef <- 1.5
n <- sum(!is.na(x))
# calculate quantiles
stats <- quantile(x, probs = c(0.1, 0.25, 0.5, 0.75, 0.9))
names(stats) <- c("ymin", "lower", "middle", "upper", "ymax")
return(stats)
}
ggplot(diamonds, aes(x=cut, y=price, fill=cut)) +
stat_summary(fun.data = calc_stat, geom="boxplot") +
facet_wrap(~clarity, scales="free")
在您的情况下,我认为限制显示范围可能会起作用,因为所有异常值都大于10000。
p1 + geom_boxplot() + ylim(0,10000)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.