I want to draw boxplots in R and add names to outliers. So far I found this solution .
The function there provides all the functionality I need, but it scrambles incorrectly the labels. In the following example, it marks the outlier as "u" instead of "o":
library(plyr)
library(TeachingDemos)
source("http://www.r-statistics.com/wp-content/uploads/2011/01/boxplot-with-outlier-label-r.txt") # Load the function
set.seed(1500)
y <- rnorm(20)
x1 <- sample(letters[1:2], 20,T)
lab_y <- sample(letters, 20)
# plot a boxplot with interactions:
boxplot.with.outlier.label(y~x1, lab_y)
Do you know of any solution? The ggplot2 library is super nice, but provides no such functionality (as far as I know). My alternative is to use the text() function and extract the outlier information from the boxplot object. However, like this the labels may overlap.
Thanks a lot :-)
I took a look at this with debug(boxplot.with.outlier.label)
, and ... it turns out there's a bug
in the function.
The error occurs on line 125, where the data.frame DATA
is constructed from x
, y
and label_name
.
Previously x
and y
have been reordered, while lab_y
hasn't been. When the supplied value of x
(your x1
) isn't itself already in order, you'll get the kind of jumbling you experienced.
As an immediate fix, you can pre-order the x
values like this (or do something more elegant)
df <- data.frame(y, x1, lab_y, stringsAsFactors=FALSE)
df <- df[order(df$x1), ]
# Needed since lab_y is not searched for in data (though it probably should be)
lab_y <- df$lab_y
boxplot.with.outlier.label(y~x1, lab_y, data=df)
The intelligent point label placement is a separate issue discussed here or here . There's no ultimate and ideal solution so you just have to pick one there.
So you would overplot the normal boxplot with labels, as follows:
set.seed(1501)
y <- c(4, 0, 7, -5, rnorm(16))
x1 <- c("a", "a", "b", "b", sample(letters[1:2], 16, T))
lab_y <- sample(letters, 20)
bx <- boxplot(y~x1)
out_lab <- c()
for (i in seq(bx$out)) {
out_lab[i] <- lab_y[which(y == bx$out[i])[1]]
}
identify(bx$group, bx$out, labels = out_lab, cex = 0.7)
Then, during the identify()
is running, you just click to position where you want the label, as described here . When finished, you just press "STOP". Note that each outlier can have more than one label! In my solution, I just simply picked the first!!
PS: I feel ashamed for the for loop, but don't know how to vectorize it - feel free to post improvement.
EDIT: inspired by the Federico's link now I see it can be done much easier! Just these 2 commands:
boxplot(y~x1)
identify(as.integer(as.factor(x1)), y, labels = lab_y, cex = 0.7)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.