简体   繁体   中英

geom_jitter removes different number of points due to missing values for plot each time its run

geom_jitter in R ggplot seems to remove a different number of points each time I plot data. I suspect this is due to overplotting (stacked points)? eg, if I create the data frame once, and then run the ggplot command multiple times, I will get varying numbers of points removed due to missing data (ranging from 0 to 1+). Is there a way to ensure a consistent number of missing points (or none)? I tried tinkering with the size, and jitter width/height, to no avail. thanks!

d <- data.frame(a = rnorm(n = 100, mean = 0, sd = 1), b = rnorm(n = 100, mean = 0, sd = 1))


ggplot(d, aes(a,b)) + geom_point(position=position_jitter(width=0.3, height=.3), size=2) + theme(panel.background=element_blank()) + scale_x_continuous(limits=c(-3, 3)) + scale_y_continuous(limits=c(-3, 3))

The jitter is pushing the points out of the ranges you specify, and the noise is calculated with each run. Try jittering yourself, so it won't change every time, or remove the range constraints.

set.seed(0)
d <- data.frame(a = rep(-2:2, each=20), b=rnorm(100))

## Specify your own jitter: 0.1 in width, 1 in height in this example
d <- d + rnorm(nrow(d)*2, 0, sd=rep(c(0.1, 1), each=nrow(d)))

## Always 4 rows removed, unless you rejitter
ggplot(d, aes(a, b)) +
  geom_point(size=2) +
  theme(panel.background=element_blank()) +
  scale_x_continuous(limits=c(-3,3)) +
  scale_y_continuous(limits=c(-3,3))

在此处输入图片说明

Edit

Actually much simpler, just set.seed prior to running what you have :)

set.seed(0)
ggplot(d, aes(a,b)) +
  geom_point(position=position_jitter(width=0.3, height=.3), size=2) +
  theme(panel.background=element_blank()) + scale_x_continuous(limits=c(-3, 3)) +
  scale_y_continuous(limits=c(-3, 3))

Another option is to not use the limits argument of scale_x_continuous . Instead, use the xlim and ylim arguments of coord_cartesian . This is the code that's meant for zooming into a portion of the plot. The limits argument in the x and y axis scales actually subsets the data that's to be plotted. Usually this makes little difference unless you're talking about statistical summaries that include data not visible on the plot.

Note: you won't get the warnings when your data points fall out of the graph.

d <- data.frame(a = rnorm(n = 100, mean = 0, sd = 1), 
                b = rnorm(n = 100, mean = 0, sd = 1))


ggplot(d, aes(a,b)) + 
  geom_point(position=position_jitter(width=0.3, height=.3), size=2) + 
  theme(panel.background=element_blank()) + 
  coord_cartesian(xlim=c(-3,3), ylim=c(-3,3))

Another, lesser known, option is to change the way scales handle their bounds, by setting the out of bounds (oob) argument.

This is not really my idea, but very much inspired by user axeman in this very similar thread .

library(ggplot2)
set.seed(0)
d <- data.frame(a = rnorm(n = 100, mean = 0, sd = 1), b = rnorm(n = 100, mean = 0, sd = 1))

ggplot(d, aes(a,b)) + 
  geom_point(position=position_jitter(width=0.3, height=.3), size=2) + 
  theme(panel.background=element_blank()) + 
  scale_x_continuous(limits=c(-3, 3), oob = scales::squish) + 
  scale_y_continuous(limits=c(-3, 3), oob = scales::squish)

Created on 2021-04-27 by the reprex package (v2.0.0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM