当您有多个不属于一列的变量时，如何指定 ggplot 图例顺序？

Question

I'm plotting the same data by different time scales (Week, Month, Quarter, etc.) using ggplot, and as a result, I'm pulling the data from different columns.我使用 ggplot 按不同的时间尺度（周、月、季度等）绘制相同的数据，因此，我从不同的列中提取数据。 However, when I see my legend, I want it to be a specific order.但是，当我看到我的图例时，我希望它是一个特定的顺序。

I know if all the grouping variables were in one column, I could set it as an ordered factor, as it explained here , but my data are spread across multiple columns.我知道如果所有分组变量都在一列中，我可以将其设置为有序因子，如此处所述，但我的数据分布在多个列中。 I also tried the suggestions here about re-ordering multiple geoms, but it didn't work.我也尝试了这里关于重新排序多个 geom 的建议，但它没有用。

Because my actual dataset is very complex, I've reproduced a smaller version that just has week and month data.因为我的实际数据集非常复杂，所以我复制了一个只有周和月数据的较小版本。 For the final answer, please allow it to specify a specific order , not just something like rev() , because in my actual dataset, I have 6 columns that need a specific order.对于最终答案，请允许它指定一个特定的顺序，而不仅仅是像rev()这样的东西，因为在我的实际数据集中，我有 6 列需要特定的顺序。

Here's a code to reproduce--for this, the first 3 chunks make the dataset, so only the 4th chunk to make the plot should be relevant for the actual solution.这是要重现的代码——为此，前 3 个块构成数据集，因此只有构成 plot 的第 4 个块应该与实际解决方案相关。 The default that R shows the order is by showing 'Score - Month' first in the legend, so I'd like to see how I could make this the 2nd. R 显示顺序的默认设置是在图例中首先显示“Score - Month”，所以我想看看如何将其设为第二个。

library(dplyr)
library(ggplot2)
library(lubridate)

#Generates week data -- shouldn't be relevant to troubleshoot
by_week <- tibble(Week = seq(as.Date("2011-01-01"), as.Date("2012-07-01"), by="weeks"),
                Week_score = c(sample(100:200, 79)),
                Month = ymd(format(Week, "%Y-%m-01")))

#Generates month data -- shouldn't be relevant to troubleshoot                
by_month <- tibble(Month = seq(as.Date("2011-01-01"), as.Date("2012-07-01"), by="months"),
                   Month_score = c(sample(150:200, 19)))

#Joins data and removes duplications of month data for easier plotting -- shouldn't be relevant to troubleshoot  
all_time <- by_week %>%
  full_join(by_month) %>%
  mutate(helper = across(c(contains("Month")), ~paste(.))) %>% 
  mutate(across(c(contains("Month")), ~ifelse(duplicated(helper), NA, .)), .keep="unused") %>%
  mutate(Month = as.Date(Month))

#Makes plot - this is where I want the order in the legend to be different
all_time %>%
  ggplot(aes(x = Week)) +
  geom_line(aes(y= Week_score, colour = "Week_score")) +
  geom_line(data=all_time[!is.na(all_time$Month_score),], aes(y = Month_score, colour = "Month_score")) + #This line tells R just to focus on non-missing values for Month_score
  scale_colour_discrete(labels = c("Week_score" = "Score - Week", "Month_score" = "Score - Month"))

Here's what the current legend looks like--I want the order switched with a solution that is scalable to more than 2 options.这是当前图例的样子——我希望使用可扩展到 2 个以上选项的解决方案来切换顺序。 Thank you!谢谢！

Answer 1

As @stefan mentioned right in the comments, you should set the names of your labels in the limits option of scale_colour_discrete .正如@stefan 在评论中提到的那样，您应该在scale_colour_discrete的limits选项中设置标签的名称。 You can add more columns by yourself.您可以自己添加更多列。 You can use the following code:您可以使用以下代码：

library(dplyr)
library(ggplot2)
library(lubridate)

#Generates week data -- shouldn't be relevant to troubleshoot
by_week <- tibble(Week = seq(as.Date("2011-01-01"), as.Date("2012-07-01"), by="weeks"),
                  Week_score = c(sample(100:200, 79)),
                  Month = ymd(format(Week, "%Y-%m-01")))

#Generates month data -- shouldn't be relevant to troubleshoot                
by_month <- tibble(Month = seq(as.Date("2011-01-01"), as.Date("2012-07-01"), by="months"),
                   Month_score = c(sample(150:200, 19)))

#Joins data and removes duplications of month data for easier plotting -- shouldn't be relevant to troubleshoot  
all_time <- by_week %>%
  full_join(by_month) %>%
  mutate(helper = across(c(contains("Month")), ~paste(.))) %>% 
  mutate(across(c(contains("Month")), ~ifelse(duplicated(helper), NA, .)), .keep="unused") %>%
  mutate(Month = as.Date(Month))

#Makes plot - this is where I want the order in the legend to be different
all_time %>%
  ggplot(aes(x = Week)) +
  geom_line(aes(y= Week_score, colour = "Week_score")) +
  geom_line(data=all_time[!is.na(all_time$Month_score),], aes(y = Month_score, colour = "Month_score")) + #This line tells R just to focus on non-missing values for Month_score
  scale_colour_discrete(labels = c("Week_score" = "Score - Week", "Month_score" = "Score - Month"), limits = c("Week_score", "Month_score"))

Output: Output：

As you can see the order of the labels is changed.如您所见，标签的顺序已更改。

当您有多个不属于一列的变量时，如何指定 ggplot 图例顺序？

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-04-03 16:14:42

当您有多个不属于一列的变量时，如何指定 ggplot 图例顺序？

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-04-03 16:14:42

解决方案1
1 已采纳 2022-04-03 16:14:42