简体   繁体   English

如何将 plot 与 ggplot2 组合成多个区域?

[英]how to plot multiple area plots with ggplot2?

I am trying to achieve a complex data viz like in the picture bellow.我正在尝试实现如下图所示的复杂数据。 But with R and ggplot2.但是使用 R 和 ggplot2。

在此处输入图像描述

As observed:据观察:

  1. there are 6 different groups "Africa", "Asia", "Europe", etc, above each sets of data visualisation;每组数据可视化上方有 6 个不同的组“非洲”、“亚洲”、“欧洲”等;
  2. 1 set comprising of 3 area plots per each continent;每个大陆 1 套,包括 3 个区域图;
  3. the x axis appears only to one set, last row of Oceania x 轴只出现在一组,大洋洲的最后一行
  4. the legend appears only once, above.图例只出现一次,在上面。
  5. There are two legends, above the plot - risk groups and conditions有两个图例,上面的plot——风险人群和条件
  6. as you can see, Africa has population in million (one chart), risk groups and conditions.如您所见,非洲拥有百万人口(一张图表)、风险群体和条件。

I am trying to achieve same results with 2 of my datasets.我正在尝试使用我的 2 个数据集获得相同的结果。 For India for example, I want in one line, a chart for symptoms and the second a chart for comorbidities.以印度为例,我想要一条线,一张症状图表,第二条线是合并症图表。 The same for UK and Pakistan.英国和巴基斯坦也一样。 Here are some fake datasets created:以下是一些创建的假数据集:

  1. https://github.com/gabrielburcea/stackoverflow_fake_data/blob/master/fake_symptoms.csv https://github.com/gabrielburcea/stackoverflow_fake_data/blob/master/fake_symptoms.csv
  2. https://github.com/gabrielburcea/stackoverflow_fake_data/blob/master/fake_comorbidities%202.csv https://github.com/gabrielburcea/stackoverflow_fake_data/blob/master/fake_comorbidities%202.csv

I have tried to get something by creating small datasets per each country and then created 2 plots, one for symptoms and the other for comorbities, and then adding them together.我试图通过为每个国家/地区创建小型数据集来获得一些东西,然后创建 2 个图,一个用于症状,另一个用于合并症,然后将它们加在一起。 But this is heavy work with so many other issues coming up.但这是一项繁重的工作,还会出现许多其他问题。 Problems may emerge taking this approach.采用这种方法可能会出现问题。 One example it is here:一个例子在这里:

india_count_symptoms <- count_symptoms %>%
  dplyr::filter(Country == "India")

india_count_symptoms$symptoms <- as.factor(india_count_symptoms$symptoms)
india_count_symptoms$Count <- as.numeric(india_count_symptoms$Count)

library(viridis)

india_sympt_plot <- ggplot2::ggplot(india_count_symptoms, ggplot2::aes(x = age_band, y = Count, group = symptoms, fill = symptoms)) +
  ggplot2::geom_area(position = "fill", color = "white") + 
  ggplot2::scale_x_discrete(limits = c("0-19", "20-39", "40-59","60+"), expand = c(0, 0)) +
  ggplot2::scale_y_continuous(expand = expansion(mult = c(0, 0.1))) + 
  viridis::scale_fill_viridis(discrete = TRUE)

india_sympt_plot  

this is what I got:这就是我得到的:

在此处输入图像描述

And as you can see:如您所见:

a.一种。 the age bands aren't nicely aligned年龄段不太一致

b. b. I end up with legends for each plot for each country, if I take this approach如果我采用这种方法,我最终会得到每个国家/地区的每个 plot 的图例

c. y axis does not give me the counts, it goes all the way to 1. and does not come intuitively right. c。y 轴没有给我计数,它一直到 1。直观上并不正确。

d. d. do the same for comorbidites and then get the same problems expressed in the above 3 points.对合并症做同样的事情,然后得到与上述3点相同的问题。

Thus, I want to follow an easier approach in order to get similar plot as in the first picture, with conditions expressed: from 1 to 5 points but for my 3 countries and for symptoms and comorbidities.因此,我想采用一种更简单的方法来获得与第一张图片中类似的 plot,条件表示为:从 1 到 5 分,但针对我的 3 个国家以及症状和合并症。 However, my real dataset is bigger, with 5 countries but with same plotting - symptoms and comorbidities.然而,我的真实数据集更大,有 5 个国家,但具有相同的绘图 - 症状和合并症。

Is there a better way of achieving this with ggplot2, in RStudio?在 RStudio 中使用 ggplot2 是否有更好的方法来实现此目的?

This is a good start - I'm not clear on some of your goals, but this answer should get you over the immediate obstacles.这是一个好的开始——我不清楚你的一些目标,但这个答案应该能让你克服眼前的障碍。

## read in your data
count_symptoms = readr::read_csv("https://github.com/gabrielburcea/stackoverflow_fake_data/raw/master/fake_symptoms.csv")

## as mentioned in comments, removing `position = 'fill'` lets your chart show counts.
## (I'm skipping the unnecessary data conversions)
## And I'm removing the `ggplot2::` to make the code more readable...
## No other changes are made

india_count_symptoms <- count_symptoms %>%
  dplyr::filter(Country == "India")

india_sympt_plot <- ggplot(india_count_symptoms, aes(x = age_band, y = Count, group = symptoms, fill = symptoms)) +
  geom_area(color = "white") + 
  scale_x_discrete(limits = c("0-19", "20-39", "40-59","60+"), expand = c(0, 0)) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) + 
  viridis::scale_fill_viridis(discrete = TRUE)

在此处输入图像描述

Now, instead of making individual plots for each country, let's use facets:现在,我们不用为每个国家制作单独的图,而是使用构面:

## same plot code as above, but we give it the whole data set
## and add the `facet_grid` on
ggplot(count_symptoms, aes(x = age_band, y = Count, group = symptoms, fill = symptoms)) +
  geom_area(color = "white") + 
  scale_x_discrete(limits = c("0-19", "20-39", "40-59","60+"), expand = c(0, 0)) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +  
  viridis::scale_fill_viridis(discrete = TRUE) + 
  facet_grid(Country ~ .)

在此处输入图像描述

Notice we have a single legend.请注意,我们只有一个图例。 You can re-position it easily as shown here .您可以轻松地重新定位它,如此处所示 Probably the next change I'd make is adding the argument labels = scales::comma_format in your scale_y_continuous .可能我要做的下一个改变是在你的 scale_y_continuous 中添加参数labels = scales::comma_format scale_y_continuous I have no idea what your issue is with the x-axis labels.我不知道您的 x 轴标签有什么问题。

For the complete figure, I'd suggest doing one facet_grid plot for each column, and then use the patchwork package to combine them into one image.对于完整的图,我建议为每一列做一个facet_grid plot,然后使用patchwork package 将它们组合成一个图像。 See how far you can get based on this, and if you continue to have issues ask a new question focused on the next step.看看基于此你能走多远,如果你仍然有问题,请提出一个关注下一步的新问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM