简体   繁体   English

当您有多个不属于一列的变量时,如何指定 ggplot 图例顺序?

[英]How to specify ggplot legend order when you have multiple variables that are not all part of one column?

I'm plotting the same data by different time scales (Week, Month, Quarter, etc.) using ggplot, and as a result, I'm pulling the data from different columns.我使用 ggplot 按不同的时间尺度(周、月、季度等)绘制相同的数据,因此,我从不同的列中提取数据。 However, when I see my legend, I want it to be a specific order.但是,当我看到我的图例时,我希望它是一个特定的顺序。

I know if all the grouping variables were in one column, I could set it as an ordered factor, as it explained here , but my data are spread across multiple columns.我知道如果所有分组变量都在一列中,我可以将其设置为有序因子,如此所述,但我的数据分布在多个列中。 I also tried the suggestions here about re-ordering multiple geoms, but it didn't work.我也尝试了这里关于重新排序多个 geom 的建议,但它没有用。

Because my actual dataset is very complex, I've reproduced a smaller version that just has week and month data.因为我的实际数据集非常复杂,所以我复制了一个只有周和月数据的较小版本。 For the final answer, please allow it to specify a specific order , not just something like rev() , because in my actual dataset, I have 6 columns that need a specific order.对于最终答案,请允许它指定一个特定的顺序,而不仅仅是像rev()这样的东西,因为在我的实际数据集中,我有 6 列需要特定的顺序。

Here's a code to reproduce--for this, the first 3 chunks make the dataset, so only the 4th chunk to make the plot should be relevant for the actual solution.这是要重现的代码——为此,前 3 个块构成数据集,因此只有构成 plot 的第 4 个块应该与实际解决方案相关。 The default that R shows the order is by showing 'Score - Month' first in the legend, so I'd like to see how I could make this the 2nd. R 显示顺序的默认设置是在图例中首先显示“Score - Month”,所以我想看看如何将其设为第二个。

library(dplyr)
library(ggplot2)
library(lubridate)

#Generates week data -- shouldn't be relevant to troubleshoot
by_week <- tibble(Week = seq(as.Date("2011-01-01"), as.Date("2012-07-01"), by="weeks"),
                Week_score = c(sample(100:200, 79)),
                Month = ymd(format(Week, "%Y-%m-01")))

#Generates month data -- shouldn't be relevant to troubleshoot                
by_month <- tibble(Month = seq(as.Date("2011-01-01"), as.Date("2012-07-01"), by="months"),
                   Month_score = c(sample(150:200, 19)))

#Joins data and removes duplications of month data for easier plotting -- shouldn't be relevant to troubleshoot  
all_time <- by_week %>%
  full_join(by_month) %>%
  mutate(helper = across(c(contains("Month")), ~paste(.))) %>% 
  mutate(across(c(contains("Month")), ~ifelse(duplicated(helper), NA, .)), .keep="unused") %>%
  mutate(Month = as.Date(Month))

#Makes plot - this is where I want the order in the legend to be different
all_time %>%
  ggplot(aes(x = Week)) +
  geom_line(aes(y= Week_score, colour = "Week_score")) +
  geom_line(data=all_time[!is.na(all_time$Month_score),], aes(y = Month_score, colour = "Month_score")) + #This line tells R just to focus on non-missing values for Month_score
  scale_colour_discrete(labels = c("Week_score" = "Score - Week", "Month_score" = "Score - Month"))

Here's what the current legend looks like--I want the order switched with a solution that is scalable to more than 2 options.这是当前图例的样子——我希望使用可扩展到 2 个以上选项的解决方案来切换顺序。 Thank you!谢谢!

在此处输入图像描述

As @stefan mentioned right in the comments, you should set the names of your labels in the limits option of scale_colour_discrete .正如@stefan 在评论中提到的那样,您应该在scale_colour_discretelimits选项中设置标签的名称。 You can add more columns by yourself.您可以自己添加更多列。 You can use the following code:您可以使用以下代码:

library(dplyr)
library(ggplot2)
library(lubridate)

#Generates week data -- shouldn't be relevant to troubleshoot
by_week <- tibble(Week = seq(as.Date("2011-01-01"), as.Date("2012-07-01"), by="weeks"),
                  Week_score = c(sample(100:200, 79)),
                  Month = ymd(format(Week, "%Y-%m-01")))

#Generates month data -- shouldn't be relevant to troubleshoot                
by_month <- tibble(Month = seq(as.Date("2011-01-01"), as.Date("2012-07-01"), by="months"),
                   Month_score = c(sample(150:200, 19)))

#Joins data and removes duplications of month data for easier plotting -- shouldn't be relevant to troubleshoot  
all_time <- by_week %>%
  full_join(by_month) %>%
  mutate(helper = across(c(contains("Month")), ~paste(.))) %>% 
  mutate(across(c(contains("Month")), ~ifelse(duplicated(helper), NA, .)), .keep="unused") %>%
  mutate(Month = as.Date(Month))

#Makes plot - this is where I want the order in the legend to be different
all_time %>%
  ggplot(aes(x = Week)) +
  geom_line(aes(y= Week_score, colour = "Week_score")) +
  geom_line(data=all_time[!is.na(all_time$Month_score),], aes(y = Month_score, colour = "Month_score")) + #This line tells R just to focus on non-missing values for Month_score
  scale_colour_discrete(labels = c("Week_score" = "Score - Week", "Month_score" = "Score - Month"), limits = c("Week_score", "Month_score"))

Output: Output:

在此处输入图像描述

As you can see the order of the labels is changed.如您所见,标签的顺序已更改。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM