简体   繁体   English

使用ggplot2用线图覆盖条形图

[英]Overlaying barplot with line graphs using ggplot2

My question is similar to those posted here and here . 我的问题类似于此处此处发布的问题。

I am working on creating a graph in ggplot where I have one bar plot and then want to overlay multiple line graphs. 我正在ggplot中创建一个图形,其中我有一个条形图,然后想要覆盖多个折线图。 For the purposes of this question, I have reproduced my code for two barplots (one that includes all years (2007-2015) and two from specific years (2007 and 2015), but ultimately I will be overlaying data from 10 different years. The data used can be found here . 出于这个问题的目的,我复制了两个条形图的代码(一个包括所有年份(2007-2015)和两个特定年份(2007年和2015年),但最终我将覆盖10年不同的数据。使用的数据可以在这里找到。

library(dplyr)
library(tidyr)
library(gridExtra)
library(ggplot2)

overallpierc<-data[(data$item=="piercing"),]

overp<-overallpierc %>%
  group_by(age) %>% 
  count(sex) %>% 
  ungroup %>% 
  mutate(age = factor(age)) %>%
  complete(age, sex, fill = list(n = 0)) %>% 
  ggplot(aes(age, n)) + geom_col(aes(fill = sex), position = "dodge") +
    theme_classic() + 
    scale_fill_manual(values=c("#000000", "#CCCCCC"), name = "Sex") + 
    labs(x = "Age", y = "Number of observations") +   
    theme(legend.position=c(0.4,0.8),
    plot.title = element_text(size = 10),
    legend.title=element_text(size=15),
    axis.title=element_text(size=15),
    legend.key.size = unit(1.13, "cm"),
    legend.direction="vertical",
    legend.text=element_text(size=15))

p07<-data[(data$yy=="2007") & (data$item=="piercing"),]
summary(p07)

subp07<-p07 %>%  
  group_by(age) %>% 
  count(sex) %>% 
  ungroup %>% 
  mutate(age = factor(age)) %>%
  complete(age, sex, fill = list(n = 0)) %>% 
  ggplot(aes(age, n)) + geom_col(aes(fill = sex), position = "dodge") +
    theme_classic() + 
    scale_fill_manual(values=c("#000000", "#CCCCCC"), name = "Sex") + 
    labs(x = "Age", y = "Number of observations") +   
    theme(legend.position=c(0.4,0.8),
    plot.title = element_text(size = 10),
    legend.title=element_text(size=15),
    axis.title=element_text(size=15),
    legend.key.size = unit(1.13, "cm"),
    legend.direction="vertical",
    legend.text=element_text(size=15))

p15<-data[(data$yy=="2015") & (data$item=="piercing"),]

subp15<-p15 %>% 
  group_by(age) %>% 
  count(sex) %>% 
  ungroup %>% 
  mutate(age = factor(age)) %>%
  complete(age, sex, fill = list(n = 0)) %>% 
  ggplot(aes(age, n)) + geom_col(aes(fill = sex), position = "dodge") +
    theme_classic() + 
    scale_fill_manual(values=c("#000000", "#CCCCCC"), name = "Sex") + 
    labs(x = "Age", y = "Number of observations") +   
    theme(legend.position=c(0.4,0.8),
    plot.title = element_text(size = 10),
    legend.title=element_text(size=15),
    axis.title=element_text(size=15),
    legend.key.size = unit(1.13, "cm"),
    legend.direction="vertical",
    legend.text=element_text(size=15))

grid.arrange(overp, subp07, subp15)

The code I have posted gives me the following figure. 我发布的代码给出了下图。 在此输入图像描述

What I am trying to do is plot the frequencies for females in 2007 and 2015 and males in 2007 and 2015 on top of the barplot for total frequencies (where this is also reflected in the legend). 我想要做的是绘制2007年和2015年女性的频率以及2007年和2015年的男性在总频率的条形图上的频率(这也反映在图例中)。 Is there a way to do that in R using ggplot2 ? 有没有办法在R使用ggplot2做到这ggplot2

UPDATE: I tried using the geom_smooth and geom_line functions to add the lines to my ggplot as suggested in the comments and as other solutions to users questions, but I get the following error: 更新:我尝试使用geom_smoothgeom_line函数将这些行添加到我的ggplot如评论中所建议的以及用户问题的其他解决方案,但是我收到以下错误:

Error: Discrete value supplied to continuous scale 错误:提供给连续刻度的离散值

I created a new data frame for a subset that I would like to plot: 我为我要绘制的子集创建了一个新的数据框:

df<-data.frame(age=c(15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,40,50,60), val=c(0,5,13,77,70,106,62,51,46,27,46,16,22,16,14,48,21, 3,4))

And then added it to the ggplot code: 然后将其添加到ggplot代码中:

overallpierc %>%
  filter(age != "15") %>% 
  group_by(age) %>% 
  count(sex) %>% 
  ungroup %>% 
  mutate(age = factor(age)) %>%
  complete(age, sex, fill = list(n = 0)) %>% 
  ggplot(aes(age, n)) +     
    geom_line(data=df,aes(x=as.numeric(age),y=val),colour="blue") +
    geom_col(aes(fill = sex), position = "dodge") +
    theme_classic() + 
    scale_fill_manual(values=c("#000000", "#CCCCCC"), name = "Sex") + 
    labs(x = "Age", y = "Number of observations") +   
    theme(legend.position=c(0.4,0.8),
    plot.title = element_text(size = 10),
    legend.title=element_text(size=15),
    axis.title=element_text(size=15),
    legend.key.size = unit(1.13, "cm"),
    legend.direction="vertical",
    legend.text=element_text(size=15))

Others have encountered similar issues and used as.numeric to solve the problem. 其他人遇到过类似的问题,并使用as.numeric来解决问题。 However, age needs to be treated as a factor for the purposes of plotting. 但是,年龄需要被视为绘图目的的一个因素。

Based on our discussion in the comments, let's try stacked bars and facets. 根据我们在评论中的讨论,让我们尝试堆叠的条形和方面。 I think it works but you can decide for yourself. 我认为它有效,但你可以自己决定。

The stacked bar has the advantage of showing both proportions and total count in the same bar. 堆叠条的优点是在同一条中显示比例和总数。 To compare years, a facet grid places years in rows, so the eye can scan downwards to compare the same age in different years. 为了比较年份,刻面网格在行中放置多年,因此眼睛可以向下扫描以比较不同年份的相同年龄。 Note that I kept age as a continuous variable here, rather than a factor. 请注意,我在这里将年龄保持为连续变量,而不是一个因素。

library(dplyr)
library(ggplot2)
data30g %>% 
  count(yy, sex, age) %>% 
  ggplot(aes(age, n)) + 
    geom_col(aes(fill = sex)) + 
    facet_grid(yy ~ .) + 
    theme_bw() + 
    scale_fill_manual(values = c("#000000", "#cccccc"))

在此输入图像描述

Not bad - I can see straight away, for example, an increase in both total and female count at age 30 over time, but perhaps a little small and crowded. 不错 - 我可以直接看到,例如,随着时间的推移,30岁时的总数和女性数量都会增加,但也许有点小而且拥挤。

We can use a facet wrap instead of a grid to make the bars clearer, but at the expense of quick visual comparison across years. 我们可以使用小平面包裹而不是网格来使条形更清晰,但代价是多年来的快速视觉比较。

data30g %>% 
  count(yy, sex, age) %>% 
  ggplot(aes(age, n)) + 
    geom_col(aes(fill = sex)) + 
    facet_wrap(~yy, ncol = 2) + 
    theme_bw() + 
    scale_fill_manual(values = c("#000000", "#cccccc"))

在此输入图像描述

One more example which does not address your question in terms of total counts or barplots - but I thought it might be of interest. 还有一个例子没有用总计数或条形图来解决你的问题 - 但我认为这可能是有意义的。 This code generates a "heatmap" style of plot which is poor for quantitative comparison, but can sometimes give a quick visual impression of interesting features. 此代码生成“热图”样式的情节,这对于定量比较来说很差,但有时可以给出有趣特征的快速视觉印象。 I think it shows, for example, that females aged 20 in 2014 have the highest total count. 例如,我认为这表明2014年的20岁女性总人数最多。

data30g %>% 
  count(yy, sex, age) %>% 
  ggplot(aes(factor(age), yy)) + 
    geom_tile(aes(fill = n)) + 
    facet_grid(sex ~ .) + 
    scale_fill_gradient2() + 
    scale_y_reverse(breaks = 2006:2015) + 
    labs(x = "age", y = "Year")

在此输入图像描述

EDIT: 编辑:

Based on further discussions in the comments, here is one way to plot age as a factor, using bars for sexes, overlaid with a line for the totals and split by year. 根据评论中的进一步讨论,这里有一种方法将年龄作为一个因素,使用性别栏,用总线覆盖并按年份分开。

overallpierc %>% 
  count(yy, sex, age) %>% 
  ggplot() + 
    geom_col(aes(factor(age), n, fill = sex), position = "dodge") +
    stat_summary(aes(factor(age), n), fun.y = "sum", geom = "line", group = 1) + 
  facet_grid(yy ~ .)

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM