why ggplot switches from a discrete to a continuous legend in this multiple lines plot?

Question

Example df:

 xnom <- seq(0,80,by=20)
 x1 <- xnom+rnorm(5,0,2)
 x2 <- x1*.9
 x3 <- x2*.9
 S1 <- seq(0,1,by=.25)
 S2 <- S1*1.3
 S3 <- S2*1.3
 df <- data.frame(xnom,x1,x2,x3,S1,S2,S3)

I want to make two different plots. One where each response S1, S2, S3 is plotted against the predictor xnom , and another where each response Si is plotted against the corresponding predictor xi . In both cases, I want to make plot a line of different color for each response, and the legend must summarize the colors of the three responses. To this end, I wrote the following function:

makeplot <- function(df,xvec){
    library(ggplot2)
if (length(xvec)==1) {
p <- ggplot(data=df, aes_string(x = xvec))
p <- p + geom_line(aes(y = S1, color = "1")) +
geom_point(aes(y = S1, color = "1")) +
geom_line(aes(y = S2, color = "2")) +
geom_point(aes(y = S2, color = "2")) +
geom_line(aes(y = S3, color = "3")) +
geom_point(aes(y = S3, color = "3"))
} else {
p <- ggplot(data=df)
p <- p + geom_line(aes_string(x = xvec[1], y = "S1", color = "1")) +
geom_point(aes_string(x = xvec[1],  y = "S1", color = "1")) +
geom_line(aes_string(x = xvec[2], y = "S2", color = "2")) +
geom_point(aes_string(x = xvec[2], y = "S2", color = "2")) +
geom_line(aes_string(x = xvec[3], y = "S3", color = "3")) +
geom_point(aes_string(x = xvec[3] , y = "S3", color = "3"))
}
p <- p + labs(color = "Section")
print(p)
}

In the single predictor case, it worked fine:

 makeplot(df,"x1")

ggplot makes a discrete scale legend which looks great. However, when I match each response to the corresponding predictor, then for some reason ggplot switches to a continuous scale:

makeplot(df,c("x1","x2","x3"))

This looks ugly: a Section 2.5 makes no sense in my case. Why is this happening, and how could I avoid it? I'm afraid it may be related to aes_string . However, I need some way to manage variable predictor names in my function, because all this is part of a larger code in which predictor names can change at runtime.

Answer 1

To formalize the suggestions being made by @RichardTelford and @DeltaIV, is there a reason that the following could not be used instead?

Note that the double melt is less than ideal (I know there is a better way, but I am blanking on it at the moment) and that I coded in the labels, instead of using xlab , ylab , and setting the name of the key, etc.

library(ggplot2)
library(dplyr)
library(reshape2)

melt(df, id.vars = c("xnom")
     , measure.vars = paste0("S",1:3)
     , variable.name = "Section"
     , value.name = "Response") %>%
  mutate(Section = gsub("^S","",Section)) %>%
  ggplot(aes(x = xnom
             , y = Response
             , col = Section)) +
  geom_point() +
  geom_line()

melt(df, id.vars = c(paste0("x",1:3))
     , measure.vars = paste0("S",1:3)
     , variable.name = "Section"
     , value.name = "Response") %>%
  melt(id.vars = c("Section","Response")
       , measure.vars = c(paste0("x",1:3))
       , value.name = "Predictor Value"
       , variable.name = "Predictor") %>%
  mutate(Section = gsub("^S","",Section)) %>%
  ggplot(aes(x = `Predictor Value`
             , y = Response
             , col = Section)) +
  geom_point() +
  geom_line() +
  facet_wrap(~Predictor)

why ggplot switches from a discrete to a continuous legend in this multiple lines plot?

Question

1 answers

solution1
0 2016-07-06 21:20:11

why ggplot switches from a discrete to a continuous legend in this multiple lines plot?

Question

1 answers

solution1 0 2016-07-06 21:20:11

solution1
0 2016-07-06 21:20:11