简体   繁体   中英

Remove outliers from a ggplotly() boxplot

I have the dataframe below:

etf_id<-c("a","b","c","d","e","a","b","c","d","e","a","b","c","d","e")
factor<-c("A","A","A","A","A","B","B","B","B","B","C","C","C","C","C")
normalized<-c(-0.048436801,2.850578601,2.551666490,0.928625186,-0.638111793,
              -0.540615895,-0.501691539,-1.099239823,-0.040736139,-0.192048665,
              0.198915407,-0.092525810,0.214317734,2.550478998,0.024613778)
df<-data.frame(etf_id,factor,normalized)

and Im trying to remove outliers with 2 ways. First I try with outlier.color = NA,outlier.size = 0,outlier.shape = NA :

library(ggplot2)
library(plotly)
ggplotly(df %>% 
  ggplot(aes(factor, normalized, color = factor)) +
  geom_boxplot(outlier.color = NA,outlier.size = 0,outlier.shape = NA) +
  coord_cartesian(ylim = quantile(df$normalized, c(0.01, 0.99), na.rm = T)))

Second example with diamonds dataset.

p<-ggplotly(diamonds %>% 
  ggplot(aes(cut,price, color = cut)) +
  geom_boxplot(outlier.color = NA,outlier.size = 0,outlier.shape = NA))

Then I try with:

ggplotly(df %>% 
  ggplot(aes(factor, normalized, color = factor)) +
  geom_boxplot(outlier.color = NA,outlier.size = 0,outlier.shape = NA) +
  coord_cartesian(ylim = quantile(boxplot.stats(df$normalized)$stats[c(1, 5)]*1.5, c(0.01, 0.99), na.rm = T)))

but this way seems to cut my plot y limits and I need a generic solution.

I am not entirely sure what you are trying to do with the second approach. However, for what it's worth, the issue you are facing is rooted in this part of the code: boxplot.stats(df$normalized)$stats[c(1, 5)]*1.5

Specifically, boxplot.stats(df$normalized)$stats returns this vector:

[1] -1.09923982 -0.34687010 -0.04073614  0.57147146  0.92862519

These are the boxplot stats (ie lower whisker, lower hinge, median, upper hinge, and upper whisker) for ALL of your data. But because the graph you are drawing is further subcategorizing the data by the factor variable, values from boxplot.stats for all of the data will not provide you with good boundaries.

Going back to your original problem of hiding outliers in boxplots: ggplotly does not honor the outlier.shape = NA argument you pass to ggplot. Instead, you should specifically hide the outliers in plotly. One solution can be found on plotly's GitHub issue tracker here .

We can go under the hood of ggplotly object and make outliers invisible. However note that hovering over the invisible outliers will still show hoverinfo of the outlier measurements.

p<-ggplotly(diamonds %>% 
            ggplot(aes(cut,price, color = cut)) +
            geom_boxplot(outlier.color = NA,outlier.size = 0,outlier.shape = 
NA))

for(i in 1:length(p)){
p$x$data[[i]]$marker$opacity = 0 
}

p

ggplotly boxplot 没有异常值

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM