[英]create a dataframe for multiple line plot for ggplot R
This question is about arranging data for a ggplot line plot.这个问题是关于为ggplot线图排列数据。 I have been doing this manually with excel and I want to work out a way to do this using r .我一直在用excel手动执行此操作,我想找到一种使用r执行此操作的方法。
I have reviewed this post which is similar Arrange dataframe format for ggplot - R我已经查看了这篇文章,它类似于ggplot - R 的排列数据帧格式
I have a dataset that looks like this:我有一个看起来像这样的数据集:
I want to convert it to a dataframe that is divided into the groups (N,A,G) and into age brackets and the proportion per age_group
.我想将它转换为一个数据帧,该数据帧分为组 (N,A,G) 和年龄段以及每个age_group
的比例。
An example of what I am trying to achieve:我试图实现的一个例子:
Appreciate your help.感谢你的帮助。
Data:数据:
structure(list(ID = 1:10, Age = c(9L, 16L, 12L, 13L, 29L, 24L,
23L, 24L, 16L, 40L), Sex = structure(c(1L, 1L, 2L, 1L, 1L, 2L,
2L, 1L, 1L, 1L), .Label = c("F", "M"), class = "factor"), Age_group =
c(1L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 4L), N = c(1L, 1L, 1L, 1L, 0L,
0L, 0L, 0L, 0L, 0L), A = c(0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L,
0L), G = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L)), class = "data.frame",
row.names = c(NA,
-10L))
We can pivot to 'long' format with pivot_longer
and then create a grouping variable with cut
on the 'Age' and get the sum
of 'n' and 'proportion'我们可以使用pivot_longer
转为“long”格式,然后在“Age”上创建一个带有cut
的分组变量,并获得“n”和“proportion”的sum
library(dplyr)
library(tidyr)
df1 %>%
pivot_longer(cols = N:G, names_to = 'group', values_to = 'n') %>%
group_by(Age_group_new = cut(Age, breaks = c(-Inf, 0, seq(10, 70, by = 10), 100, Inf)), group) %>%
summarise(n = sum(n)) %>%
group_by(Age_group_new) %>%
mutate(proportion = n/sum(n),
proportion = replace(proportion, is.nan(proportion), 0))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.