简体   繁体   中英

R Plotly jittered boxplot with NAs width

I am plotting the grouped boxplot with jittering with the following function:

plot_boxplot <- function(dat) {
  # taking one of each joine_group to be able to plot it
  allx <- dat %>% 
    mutate(y = median(y, na.rm = TRUE)) %>%
    group_by(joined_group) %>% 
    sample_n(1) %>% 
    ungroup()

  p <- dat %>%
    plotly::plot_ly() %>%
    # plotting all the groups 1:20
    plotly::add_trace(data = allx, 
                      x = ~as.numeric(joined_group),
                      y = ~y,
                      type = "box",
                      hoverinfo = "none",
                      boxpoints = FALSE,
                      color = NULL,
                      opacity = 0,
                      showlegend = FALSE) %>% 
    # plotting the boxes
    plotly::add_trace(data = dat, 
                      x = ~as.numeric(joined_group),
                      y = ~y,
                      color = ~group1,
                      type = "box",
                      hoverinfo = "none",
                      boxpoints = FALSE,
                      showlegend = FALSE) %>% 
    # adding ticktext
    layout(xaxis = list(tickvals = 1:20,
                        ticktext = rep(levels(dat$group1), each = 4)))

  p <- p %>%
    # adding jittering
    add_markers(data = dat,
                x = ~jitter(as.numeric(joined_group), amount = 0.2),
                y = ~y,
                color = ~group1,
                showlegend = FALSE)
  p

}

The problem is that when some of the levels have NA as y variable the width of the jittered boxes changes. Here is an example:

library(plotly)
library(dplyr)
set.seed(123)
dat <- data.frame(group1 = factor(sample(letters[1:5], 100, replace = TRUE)),
                  group2 = factor(sample(LETTERS[21:24], 100, replace = TRUE)),
                  y = runif(100)) %>% 
  dplyr::mutate(joined_group = factor(
    paste0(group1, "-", group2)
  ))

# do the plot with all the levels
p1 <- plot_boxplot(dat)

# now the group1 e is having NAs as y values
dat$y[dat$group1 == "e"] <- NA

# create the plot with missing data
p2 <- plot_boxplot(dat)

# creating the subplot to see that the width has changed:
subplot(p1, p2, nrows = 2)

The problem is that the width of boxes in both plots is different: 在此处输入图片说明

I've realised that the boxes have the same size without jittering so I know that the jittering is "messing" with the width but I don't know how to fix that. 在此处输入图片说明

Does anyone know how to make the width in both jittered plots exactly the same?

I see two separate plot shifts:

  1. due to jittering
  2. due to NAs

First can be solved by declaring new jitter function with fixed seed

fixed_jitter <- function (x, factor = 1, amount = NULL) {
  set.seed(42)
  jitter(x, factor, amount)
}

and using it instead of jitter in add_markers call.

Second problem can be solved by assigning -1 instead of NA and setting

yaxis = list(range = c(0, ~max(1.1 * y)))

as a second parameter to layout .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM