[英]Group and count observations per year from a dataset containing interval data
I have data concerning the activity of a number of different writers, the data includes the start.date
and end.date
of their writing careers 我有一些有关不同作家活动的数据,这些数据包括他们写作生涯的开始start.date
和end.date
。
library("tidyverse")
writing_period_data <- tribble(
~start.date, ~end.date, ~writer, ~topic,
12, 18, "a", sample(letters[10:20],1),
14, 20, "b", sample(letters[10:20],1),
17, 22, "c", sample(letters[10:20],1),
15, 30, "a", sample(letters[10:20],1)
)
I would like to ultimately create a joyplot of this data, which requires me to generate this data structure: 我想最终创建一个此数据的游戏图,这需要我生成以下数据结构:
desired_output <- tribble(
~year, ~count, ~writer,
12, 1, "a",
13, 1, "a",
14, 1, "a",
14, 1, "b",
15, 2, "a",
15, 1, "b",
16, 2, "a",
16, 1, "b",
17, 2, "a",
17, 1, "b",
17, 1, "c",
18, 2, "a",
18, 1, "b",
18, 1, "c",
19, 1, "a",
19, 1, "b",
19, 1, "c",
20, 1, "a",
20, 1, "b",
20, 1, "c",
21, 1, "a",
21, 1, "c",
22, 1, "a",
22, 1, "c",
23, 1, "a",
24, 1, "a"
)
Which we can see from this chart demonstrates the distribution of writers across the time period of interest: 我们可以从此图表中看到演示了感兴趣的时间段内作家的分布:
desired_output %>%
ggplot(aes(x = year, y = count, fill = writer)) + geom_col()
How can I go about generating desired_output
from writing_period_data
? 我该如何去产生desired_output
从writing_period_data
?
A solution from tidyverse
. tidyverse
的解决方案。 dt
is the final output. dt
是最终输出。
library(tidyverse)
dt <- writing_period_data %>%
mutate(year = map2(start.date, end.date, `:`)) %>%
unnest() %>%
count(year, writer) %>%
select(year, count = n, writer)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.