简体   繁体   中英

“facet_grid” and overplot: puzzling behaviour

I am plotting some data using facet_grid() , and I noticed something puzzling.

I anticipate I am a beginner with ggplot libraries and I might have missed something. Anyhow, here it goes.

Assuming the following dataframe:

library(ggplot2)

d1 <- runif(500)
d2 <- runif(500)*10
s1 <- sample(LETTERS[1:2], 500, replace = T, prob=c(0.3, 0.7))
s2 <- sample(letters[3:4], 500, replace = T, prob=c(0.4, 0.6))
df <- data.frame(s1, s2, d1, d2)

which looks like this:

s2 s1 d1        d2
c  B  0.3434944 0.9881925
d  A  0.7847741 9.7759946
d  A  0.3142764 2.3654268
...

I plot the data so that they are sorted according to the categorical values:

ggplot(df, aes(x=df$d1, y=df$d2)) +
geom_point(col="red", cex=2) +
facet_grid(d2 ~ d1)

Resulting in the following plot:

Plot 1

I now want to overplot only a subset of the data, and I used the following (here simplified) code:

geom_point(data=df[df$d2 > 7.5,],
aes(x=df$d1[df$d2 > 7.5], y=df$d2[df$d2 > 7.5]),
cex=1, colour=I("black"))

Resulting into the following plot:

Plot 2

Now, having set a threshold, I expect that all values, say, "bigger than threshold" were plotted onto pre-existing values.

This does not appear to be the case.

In fact, some pre-existing values do not have the matching thresholded value. Also, some thresholded values do not have the matching pre-existing value. What puzzles me most is that, it is my understanding, that the data points come from the same dataframe, and I expect the first layer (the pre-existing ones) to contain the second layer. Am I missing something here?

Also, if looking carefully, the plotted points are matching the right 2D-position, although they are in the wrong quadrant.

Even more puzzling: if I plot the following subsets:

ggplot(df[df$d2 < 7.5,], aes(x=df$d1[df$d2 < 7.5], y=df$d2[df$d2 < 7.5])) +
geom_point(col="red", cex=2) +
facet_grid(d2 ~ d1) +
geom_point(data=df[df$d2 > 7.5,], aes(x=df$d1[df$d2 > 7.5], y=df$d2[df$d2 > 7.5]), cex=1, colour=I("black"))

Some of the pre-existing values move from the region "above threshold" to that "below threshold". Can anybody explain such behaviour?

Thanks a lot.

I can't exactly explain the why of your problem, but I think your subsets within the plot function were not recognising the facets. By creating a new T/F column in the dataframe , we can control the colours and size for each individual facet . Is this any good?

EDIT Using hollow points, shape=21 and scale_fill_manual , to exactly address the question.

df$d<-df$d2>7.5

ggplot(data=df, aes(x=d1, y=d2,colour=d,size=d,fill=d))+
    facet_grid(s1~s2)+
    geom_point(show.legend=F,shape=21,size=2,stroke=1.5,col="red")+
    scale_fill_manual(values=setNames(c('black','red'),c(T,F)))

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM