简体   繁体   中英

geom_boxplot gave wrong whiskers

I am making a boxplot using geom_boxplot in ggplot2. However, I found the whiskers length is not correct and I don't know why. Here is my data:

value = c(1.3739117,0.8709891,3.4510461,0.8470309,1.4838725,0.6942611,1.3095816,3.0444649,19.2785424,1.0866242,0.9376845,2.2343836, 20.7975509, 20.3102489, 18.0046679,1.4197519)
data = data.frame(value)
ggplot(data, aes(y = value)) +
   stat_boxplot(geom = "errorbar", width = 0.3) +
   geom_boxplot(width = 0.5)

And I see the plot like this:

在此处输入图片说明

The 3rd quantile is overlapped with the upper whisker. I did the calculation manually, and the result is as following:

summary(data)
Min.   : 0.6943  
1st Qu.: 1.0494  
Median : 1.4518  
Mean   : 6.0715  
3rd Qu.: 7.0895  
Max.   :20.7976

Based on the explanation of geom_boxplot: The upper whisker extends from the hinge to the largest value no further than 1.5 * IQR from the hinge (where IQR is the inter-quartile range, or distance between the first and third quartiles). The lower whisker extends from the hinge to the smallest value at most 1.5 * IQR of the hinge.

The IQR in my case is: 7.0895-1.0494 = 6.0401

The lower whisker should be: 0.6943 - 1.5*6.0401 = -8.36585

The upper whisker should be: 7.0895 + 1.5*6.0401 = 16.14965

I understand the negative lower whisker is meaningless, so here it is replaced by the min value. But why the upper whisker is not shown? I am so confused and I could not find an example online to solve this problem. Something I misunderstand about ggplot settings? I would really appreciate to your help and suggestions!

From the quoted section:

The upper whisker extends from the hinge to the largest value no further than 1.5 * IQR from the hinge (where IQR is the inter-quartile range, or distance between the first and third quartiles).

By "value" they mean from among the original data points . If you plot the data, there are no values between the top hinge at 7.09 and 16.15 (+1.5*IQR). If these quartiles had arisen from data with one of the values lying in that range, the upper whisker would go there.

ggplot(data, aes(y = value)) +
  geom_jitter(aes(x = 0.5), width = 0.05) +
  stat_boxplot(geom = "errorbar", width = 0.3, 
               color = "red", size = 1.5) +
  geom_boxplot(width = 0.5, alpha = 0.5) +
  geom_hline(yintercept = c(7.09, 16.15), lty = "dashed")

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM