简体   繁体   中英

Excluding outliers when plotting a Stripchart with ggplot2

I'm trying to create a combination Boxplot/Scatterplot. I'm doing alright with it so far but there's one issue that's really bothering me that I've been unable to figure out. I'm in R and I've installed the ggplot2 package. Here's the code I'm using:

  #(xx= stand in for my data set, which I imported from excel with the 
      #  column labels as the X-axis values)
  > boxplot(xx, lwd = 1.5, ylab = 'Minutes', xlab = "Epoch")
  > stripchart(xx, vertical = TRUE, 
  +            method = "jitter", add = TRUE, pch = 20, col = 'blue')

This gives me a plot that is pretty close to what I want but the problem is that the outliers are placed on the chart twice. If possible, I'd like to have the stripchart exclude them (highest groups of blue dots) and only use the ones from the boxplot (black outlined circles) so they stand out as different and don't look so sloppy.

I've tried to alter the points in question by putting a lot of different outlier arguments into the stripchart command, unfortunately with no luck. I've tried setting y-limits below their values, tried using outline=false (which completely removes the stripchart), tried changing outlier color, outpch, etc. The command has not worked for any of these attempts. Here's an example of ylim:

 > stripchart(xx, vertical = TRUE, 
+       method = "jitter", add = TRUE, pch = 20, col = 'blue', ylim = true, 
ylim (0,20))

Error in ylim(0, 20) : could not find function "ylim"

And here's an example with outlier color:

> stripchart(xx vertical = TRUE, 
+   method = "jitter", add = TRUE, pch = 20, col = 'blue', outcol = "black")

Warning messages:
1: In plot.xy(xy.coords(x, y), type = type, ...) : "outcol" is not a graphical parameter
.......# warning messages continue as such.

Are stripcharts capable of outlier exclusion? Or do I simply not know enough about them yet (and R as a whole, for that matter) to effectively write the code?

If this can be done, how should I proceed? I'm totally fine with solutions that don't directly address the outlier issue in terms of the data as long as the visual effect on the plot is the same.

Thank you for your time and any help you can give!

Edit: Here's some of the data to play around with. Top row is column labels and data is beneath. Sorry if this formatting is bad.The 29s and 30s and such in the 9th row of data, 10th overall, are examples of some of the points plotted as outliers in my graphs that I would like to keep in the boxplot but not in the scatterplot/stripchart.

1   5   10  15  30  60
7.233333333 8.166666667 9.666666667 7.75    9   7
7.133333333 9.25    9.333333333 9.75    10  11
0.733333333 0.5 0.833333333 1   1   0
1.766666667 1.166666667 1   0.75    1   0
1.75    2.25    2.333333333 2.25    1   1
6.75    7   7.166666667 7.75    6.5 7
1.516666667 1.75    1.333333333 2   2   2
1.533333333 1.5 2   1.25    1.5 2
27.3    28.33333333 29.33333333 30.25   28.5    29
6.35    6   6.333333333 7   6   6
7.083333333 8.333333333 8.833333333 8.75    8   8
8.533333333 10.08333333 10.5    12  10.5    11
7.65    8.416666667 9   10.75   9   12
6.85    7.333333333 8   7.25    6   8
4.433333333 5   5.5 5   6.5 6
8.616666667 10  11.66666667 12.25   13  12
3.633333333 3.75    3.5 3.25    3   2
0.8 0.75    0.833333333 1   1   0
7.283333333 8.583333333 9.666666667 9.75    12  8
7.483333333 8.75    8.333333333 7.75    6.5 7
3.466666667 2.916666667 3.166666667 2.5 2   0
5.483333333 6.416666667 6.833333333 6.75    7   8

There are a few things going on here. If you wanted to stick with the base plotting functions ( boxplot() and stripchart() ), you could simply tell stripchart to plot only the points that are within some criterion. A common standard for outliers would be any point 3 or more standard deviations away from the mean. Instead of passing your unmodified data set to stripchart , we subset that data set (note the [ ] brackets).

boxplot(xx)
stripchart(xx[xx <= mean(xx) + sd(xx) * 3], vertical = T, method = 'jitter', add = T, pch = 20, col = 'blue')

在此处输入图片说明

Of course, if you really did want to use ggplot2 (and I recommend installing not only that package, but the entire tidyverse with install.packages('tidyverse') ), you could produce an arguably nicer plot:

在此处输入图片说明

The data formatting and commands needed to produce the ggplot version are quite different from the base graphics version, and beyond the scope of this answer. Reproducible code follows.

library(tidyverse)

df <- structure(list(X1 = c(7.233333333, 7.133333333, 0.733333333, 1.766666667, 1.75, 6.75, 1.516666667, 1.533333333, 27.3, 6.35, 7.083333333, 8.533333333, 7.65, 6.85, 4.433333333, 8.616666667, 3.633333333, 0.8, 7.283333333, 7.483333333, 3.466666667, 5.483333333 ), X5 = c(8.166666667, 9.25, 0.5, 1.166666667, 2.25, 7, 1.75, 1.5, 28.33333333, 6, 8.333333333, 10.08333333, 8.416666667, 7.333333333, 5, 10, 3.75, 0.75, 8.583333333, 8.75, 2.916666667, 6.416666667 ), X10 = c(9.666666667, 9.333333333, 0.833333333, 1, 2.333333333, 7.166666667, 1.333333333, 2, 29.33333333, 6.333333333, 8.833333333, 10.5, 9, 8, 5.5, 11.66666667, 3.5, 0.833333333, 9.666666667, 8.333333333, 3.166666667, 6.833333333), X15 = c(7.75, 9.75, 1, 0.75, 2.25, 7.75, 2, 1.25, 30.25, 7, 8.75, 12, 10.75, 7.25, 5, 12.25, 3.25, 1, 9.75, 7.75, 2.5, 6.75), X30 = c(9, 10, 1, 1, 1, 6.5, 2, 1.5, 28.5, 6, 8, 10.5, 9, 6, 6.5, 13, 3, 1, 12, 6.5, 2, 7), X60 = c(7L, 11L, 0L, 0L, 1L, 7L, 2L, 2L, 29L, 6L, 8L, 11L, 12L, 8L, 6L, 12L, 2L, 0L, 8L, 7L, 0L, 8L)), .Names = c("X1", "X5", "X10", "X15", "X30", "X60"), class = "data.frame", row.names = c(NA, -22L))

df.long <- gather(df, x, value) %>% 
  mutate(x = as.factor(as.numeric(gsub('X', '', x)))) %>% 
  group_by(x) %>% 
  mutate(is.outlier = value > mean(value) + sd(value) * 3)

plot.df <- ggplot(data = df.long, aes(x = x, y = value, group = x)) +
  geom_boxplot() +
  geom_point(data = filter(df.long, !is.outlier), color = '#0000ff88', position = position_jitter(width = 0.1))
print(plot.df)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM