简体   繁体   English

geom_jitter 删除不同数量的点,因为每次运行时都缺少绘图值

[英]geom_jitter removes different number of points due to missing values for plot each time its run

geom_jitter in R ggplot seems to remove a different number of points each time I plot data.每次我绘制数据时,R ggplot 中的 geom_jitter 似乎都会删除不同数量的点。 I suspect this is due to overplotting (stacked points)?我怀疑这是由于过度绘制(堆积点)? eg, if I create the data frame once, and then run the ggplot command multiple times, I will get varying numbers of points removed due to missing data (ranging from 0 to 1+).例如,如果我创建数据框一次,然后多次运行 ggplot 命令,由于丢失数据(范围从 0 到 1+),我将删除不同数量的点。 Is there a way to ensure a consistent number of missing points (or none)?有没有办法确保丢失点的数量一致(或没有)? I tried tinkering with the size, and jitter width/height, to no avail.我尝试修改大小和抖动宽度/高度,但无济于事。 thanks!谢谢!

d <- data.frame(a = rnorm(n = 100, mean = 0, sd = 1), b = rnorm(n = 100, mean = 0, sd = 1))


ggplot(d, aes(a,b)) + geom_point(position=position_jitter(width=0.3, height=.3), size=2) + theme(panel.background=element_blank()) + scale_x_continuous(limits=c(-3, 3)) + scale_y_continuous(limits=c(-3, 3))

The jitter is pushing the points out of the ranges you specify, and the noise is calculated with each run.抖动将点推到您指定的范围之外,并且每次运行都会计算噪声。 Try jittering yourself, so it won't change every time, or remove the range constraints.试着让自己抖动一下,这样它就不会每次都改变,或者移除范围限制。

set.seed(0)
d <- data.frame(a = rep(-2:2, each=20), b=rnorm(100))

## Specify your own jitter: 0.1 in width, 1 in height in this example
d <- d + rnorm(nrow(d)*2, 0, sd=rep(c(0.1, 1), each=nrow(d)))

## Always 4 rows removed, unless you rejitter
ggplot(d, aes(a, b)) +
  geom_point(size=2) +
  theme(panel.background=element_blank()) +
  scale_x_continuous(limits=c(-3,3)) +
  scale_y_continuous(limits=c(-3,3))

在此处输入图片说明

Edit编辑

Actually much simpler, just set.seed prior to running what you have :)实际上要简单得多,只需在运行您拥有的set.seed之前set.seed :)

set.seed(0)
ggplot(d, aes(a,b)) +
  geom_point(position=position_jitter(width=0.3, height=.3), size=2) +
  theme(panel.background=element_blank()) + scale_x_continuous(limits=c(-3, 3)) +
  scale_y_continuous(limits=c(-3, 3))

Another option is to not use the limits argument of scale_x_continuous .另一种选择是不使用scale_x_continuouslimits参数。 Instead, use the xlim and ylim arguments of coord_cartesian .相反,使用xlimylim的参数coord_cartesian This is the code that's meant for zooming into a portion of the plot.这是用于放大绘图的一部分的代码。 The limits argument in the x and y axis scales actually subsets the data that's to be plotted. x 和 y 轴刻度中的限制参数实际上是要绘制的数据的子集。 Usually this makes little difference unless you're talking about statistical summaries that include data not visible on the plot.通常,这没什么区别,除非您谈论的统计摘要包括图表上不可见的数据。

Note: you won't get the warnings when your data points fall out of the graph.注意:当您的数据点超出图表时,您将不会收到警告。

d <- data.frame(a = rnorm(n = 100, mean = 0, sd = 1), 
                b = rnorm(n = 100, mean = 0, sd = 1))


ggplot(d, aes(a,b)) + 
  geom_point(position=position_jitter(width=0.3, height=.3), size=2) + 
  theme(panel.background=element_blank()) + 
  coord_cartesian(xlim=c(-3,3), ylim=c(-3,3))

Another, lesser known, option is to change the way scales handle their bounds, by setting the out of bounds (oob) argument.另一个鲜为人知的选项是通过设置 out of bounds (oob) 参数来更改 scales 处理其边界的方式。

This is not really my idea, but very much inspired by user axeman in this very similar thread .这真的不是我的想法,而是在这个非常相似的线程中受到用户 axeman 的很大启发。

library(ggplot2)
set.seed(0)
d <- data.frame(a = rnorm(n = 100, mean = 0, sd = 1), b = rnorm(n = 100, mean = 0, sd = 1))

ggplot(d, aes(a,b)) + 
  geom_point(position=position_jitter(width=0.3, height=.3), size=2) + 
  theme(panel.background=element_blank()) + 
  scale_x_continuous(limits=c(-3, 3), oob = scales::squish) + 
  scale_y_continuous(limits=c(-3, 3), oob = scales::squish)

Created on 2021-04-27 by the reprex package (v2.0.0)reprex 包( v2.0.0 ) 于 2021 年 4 月 27 日创建

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM