简体   繁体   English

geom_boxplot 给出了错误的胡须

[英]geom_boxplot gave wrong whiskers

I am making a boxplot using geom_boxplot in ggplot2.我正在使用 ggplot2 中的 geom_boxplot 制作箱线图。 However, I found the whiskers length is not correct and I don't know why.但是,我发现胡须长度不正确,我不知道为什么。 Here is my data:这是我的数据:

value = c(1.3739117,0.8709891,3.4510461,0.8470309,1.4838725,0.6942611,1.3095816,3.0444649,19.2785424,1.0866242,0.9376845,2.2343836, 20.7975509, 20.3102489, 18.0046679,1.4197519)
data = data.frame(value)
ggplot(data, aes(y = value)) +
   stat_boxplot(geom = "errorbar", width = 0.3) +
   geom_boxplot(width = 0.5)

And I see the plot like this:我看到这样的情节:

在此处输入图片说明

The 3rd quantile is overlapped with the upper whisker.第三个分位数与上部晶须重叠。 I did the calculation manually, and the result is as following:我是手动计算的,结果如下:

summary(data)
Min.   : 0.6943  
1st Qu.: 1.0494  
Median : 1.4518  
Mean   : 6.0715  
3rd Qu.: 7.0895  
Max.   :20.7976

Based on the explanation of geom_boxplot: The upper whisker extends from the hinge to the largest value no further than 1.5 * IQR from the hinge (where IQR is the inter-quartile range, or distance between the first and third quartiles).基于 geom_boxplot 的解释:上须从铰链延伸到最大值,距离铰链不超过 1.5 * IQR(其中 IQR 是四分位距,或第一和第三四分位数之间的距离)。 The lower whisker extends from the hinge to the smallest value at most 1.5 * IQR of the hinge.下部晶须从铰链延伸到最小值,最多为铰链的 1.5 * IQR。

The IQR in my case is: 7.0895-1.0494 = 6.0401在我的情况下 IQR 是:7.0895-1.0494 = 6.0401

The lower whisker should be: 0.6943 - 1.5*6.0401 = -8.36585下须应为:0.6943 - 1.5*6.0401 = -8.36585

The upper whisker should be: 7.0895 + 1.5*6.0401 = 16.14965上须应为:7.0895 + 1.5*6.0401 = 16.14965

I understand the negative lower whisker is meaningless, so here it is replaced by the min value.我知道负的下须是没有意义的,所以这里用最小值代替。 But why the upper whisker is not shown?但是为什么没有显示上面的晶须呢? I am so confused and I could not find an example online to solve this problem.我很困惑,我在网上找不到解决这个问题的例子。 Something I misunderstand about ggplot settings?我对 ggplot 设置有什么误解? I would really appreciate to your help and suggestions!我非常感谢您的帮助和建议!

From the quoted section:从引用部分:

The upper whisker extends from the hinge to the largest value no further than 1.5 * IQR from the hinge (where IQR is the inter-quartile range, or distance between the first and third quartiles).上须从铰链延伸到最大值,距离铰链不超过 1.5 * IQR (其中 IQR 是四分位距,或第一和第三四分位数之间的距离)。

By "value" they mean from among the original data points .他们所说的“价值”是指来自原始数据点 If you plot the data, there are no values between the top hinge at 7.09 and 16.15 (+1.5*IQR).如果绘制数据,顶部铰链之间在 7.09 和 16.15 (+1.5*IQR) 之间没有值。 If these quartiles had arisen from data with one of the values lying in that range, the upper whisker would go there.如果这些四分位数是从其中一个值位于该范围内的数据中产生的,则上须线会出现在那里。

ggplot(data, aes(y = value)) +
  geom_jitter(aes(x = 0.5), width = 0.05) +
  stat_boxplot(geom = "errorbar", width = 0.3, 
               color = "red", size = 1.5) +
  geom_boxplot(width = 0.5, alpha = 0.5) +
  geom_hline(yintercept = c(7.09, 16.15), lty = "dashed")

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM