[英]Using dplyr to average time series groups with individuals of different lengths
Consider dat
created here:考虑在dat
创建的数据:
set.seed(123)
ID = factor(letters[seq(6)])
time = c(100, 102, 120, 105, 109, 130)
dat <- data.frame(ID = rep(ID,time), Time = sequence(time))
dat$group <- rep(c("GroupA","GroupB"), c(322,344))
dat$values <- sample(100, nrow(dat), TRUE)
We have time series data for 6 individuals (6 ID
s), which belong to 2 groups ( GroupA
and GroupB
).我们有 6 个个体(6 个ID
)的时间序列数据,它们属于 2 个组( GroupA
和GroupB
)。 We want to make a line plot that shows the "average" time series of both groups (so there will be two lines).我们想制作一行 plot 来显示两组的“平均”时间序列(因此会有两行)。 Since the individuals all have different lengths, we need to do dat%>%group_by(group)
, and shave off values after the shortest ID
within both groups.由于每个人的长度都不同,我们需要执行dat%>%group_by(group)
,并在两个组中去除最短ID
之后的值。 In other words, ID == a
is the shortest in group 1, so the "average" line for GroupA
will only be 100 values long on the x-axis;换句话说, ID == a
是组 1 中最短的,因此GroupA
的“平均”行在 x 轴上的长度仅为 100 个值; likewise ID == d
is the shortest for GroupB
so the "average" time series of GroupB
will be 105 values long on the x axis ( time
).同样, ID == d
是GroupB
的最短时间序列,因此GroupB
的“平均”时间序列在 x 轴( time
)上的长度为 105 个值。 How can we do this (preferably through a dplyr
pipe) and send the data to ggplot()
?我们如何做到这一点(最好通过dplyr
管道)并将数据发送到ggplot()
?
You could try:你可以试试:
library(ggplot2)
library(dplyr)
dat %>%
group_by(ID) %>%
mutate(maxtime = max(Time)) %>%
group_by(group) %>%
mutate(maxtime = min(maxtime)) %>%
group_by(group, Time) %>%
summarize(values = mean(values)) %>%
ggplot(aes(Time, values, colour = group)) + geom_line()
We could do我们可以做
library(dplyr)
dat %>%
add_count(group, ID) %>%
group_by(group) %>%
mutate(n = min(n)) %>%
group_by(group, ID) %>%
summarise(values = mean(values[seq_len(first(n))]))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.