简体   繁体   English

从包含间隔数据的数据集中每年对观察进行分组和计数

[英]Group and count observations per year from a dataset containing interval data

I have data concerning the activity of a number of different writers, the data includes the start.date and end.date of their writing careers 我有一些有关不同作家活动的数据,这些数据包括他们写作生涯的开始start.dateend.date

library("tidyverse")
writing_period_data <- tribble(
  ~start.date, ~end.date, ~writer, ~topic,
  12, 18, "a", sample(letters[10:20],1),
  14, 20, "b", sample(letters[10:20],1),
  17, 22, "c", sample(letters[10:20],1),
  15, 30, "a", sample(letters[10:20],1)
)

I would like to ultimately create a joyplot of this data, which requires me to generate this data structure: 我想最终创建一个此数据的游戏图,这需要我生成以下数据结构:

desired_output <- tribble(
  ~year, ~count, ~writer,
  12, 1, "a",
  13, 1, "a",
  14, 1, "a",
  14, 1, "b",
  15, 2, "a",
  15, 1, "b",
  16, 2, "a",
  16, 1, "b",
  17, 2, "a",
  17, 1, "b",
  17, 1, "c",
  18, 2, "a",
  18, 1, "b",
  18, 1, "c",
  19, 1, "a",
  19, 1, "b",
  19, 1, "c",
  20, 1, "a",
  20, 1, "b",
  20, 1, "c",
  21, 1, "a",
  21, 1, "c",
  22, 1, "a",
  22, 1, "c",
  23, 1, "a",
  24, 1, "a"
)

Which we can see from this chart demonstrates the distribution of writers across the time period of interest: 我们可以从此图表中看到演示了感兴趣的时间段内作家的分布:

desired_output %>%
  ggplot(aes(x = year, y = count, fill = writer)) + geom_col()

在此处输入图片说明

How can I go about generating desired_output from writing_period_data ? 我该如何去产生desired_outputwriting_period_data

A solution from tidyverse . tidyverse的解决方案。 dt is the final output. dt是最终输出。

library(tidyverse)

dt <- writing_period_data %>%
  mutate(year = map2(start.date, end.date, `:`)) %>%
  unnest() %>%
  count(year, writer) %>%
  select(year, count = n, writer)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM