简体   繁体   中英

create a dataframe for multiple line plot for ggplot R

This question is about arranging data for a line plot. I have been doing this manually with and I want to work out a way to do this using .

I have reviewed this post which is similar Arrange dataframe format for ggplot - R

I have a dataset that looks like this:

数据集 ] 1

I want to convert it to a that is divided into the groups (N,A,G) and into age brackets and the proportion per age_group .

An example of what I am trying to achieve:

我想为 ggplot 创建什么

Appreciate your help.

Data:

structure(list(ID = 1:10, Age = c(9L, 16L, 12L, 13L, 29L, 24L, 
23L, 24L, 16L, 40L), Sex = structure(c(1L, 1L, 2L, 1L, 1L, 2L, 
2L, 1L, 1L, 1L), .Label = c("F", "M"), class = "factor"), Age_group = 
c(1L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 4L), N = c(1L, 1L, 1L, 1L, 0L, 
0L, 0L, 0L, 0L, 0L), A = c(0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 
0L), G = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L)), class = "data.frame", 
row.names = c(NA, 
-10L))

We can pivot to 'long' format with pivot_longer and then create a grouping variable with cut on the 'Age' and get the sum of 'n' and 'proportion'

library(dplyr)
library(tidyr)
df1 %>%
     pivot_longer(cols = N:G, names_to = 'group', values_to = 'n') %>%
     group_by(Age_group_new = cut(Age, breaks = c(-Inf, 0, seq(10, 70, by = 10), 100, Inf)), group) %>%
     summarise(n = sum(n)) %>%
     group_by(Age_group_new) %>% 
     mutate(proportion = n/sum(n),
         proportion = replace(proportion, is.nan(proportion), 0))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM