[英]Filling in Missing dates with missing values by a specific column in R
I have the following table: 我有下表:
Name Date Quiz Homework
John 11-01-02 40 10
John 11-01-03 47 20
John 11-01-04 41 10
John 11-01-08 35 10
John 11-01-10 43 15
John 11-01-13 40 10
Adam 11-01-05 41 10
Adam 11-01-08 41 15
Adam 11-01-14 49 10
Adam 11-01-19 40 20
Adam 11-01-21 40 10
You can see that there are some time gaps. 您会看到有一些时间间隔。 I would like to fill in those time gaps by name and replace the quiz, homework scores for those missing dates with zero. 我想按名称填写这些时间间隔,并将那些缺少日期的测验,作业分数替换为零。 Thus, the final outcome I want would be the following 因此,我想要的最终结果如下
Name Date Quiz Homework
John 11-01-02 40 10
John 11-01-03 47 20
John 11-01-04 41 10
John 11-01-05 0 0
John 11-01-06 0 0
John 11-01-07 0 0
John 11-01-08 35 10
John 11-01-09 0 0
John 11-01-10 43 15
John 11-01-11 0 0
John 11-01-12 0 0
John 11-01-13 40 10
Adam 11-01-05 41 10
Adam 11-01-06 0 0
Adam 11-01-07 0 0
Adam 11-01-08 41 15
Adam 11-01-09 0 0
Adam 11-01-10 0 0
Adam 11-01-11 0 0
Adam 11-01-12 0 0
Adam 11-01-13 0 0
Adam 11-01-14 49 10
Adam 11-01-15 0 0
Adam 11-01-16 0 0
Adam 11-01-17 0 0
Adam 11-01-18 0 0
Adam 11-01-19 40 20
Adam 11-01-20 0 0
Adam 11-01-21 40 10
Is there a fast way of doing it? 有快速的方法吗? What I did was the following: 我所做的如下:
1) Find a minimum, maximum dates by name
2) For each name, create a sequence of dates from minimum, maximum dates found in step 1)
3) Join the table created in step 2) with the original table.
4) replace NA values in Quiz, Homework by zero
but that was rather slow. 但这很慢。 I was wondering if there's a fast way of doing it. 我想知道是否有快速的方法。
A solution using data.table
package which should be fast: 使用data.table
包的解决方案应该很快:
library(data.table)
DT <- fread("Name Date Quiz Homework
John 11-01-02 40 10
John 11-01-03 47 20
John 11-01-04 41 10
John 11-01-08 35 10
John 11-01-10 43 15
John 11-01-13 40 10
Adam 11-01-05 41 10
Adam 11-01-08 41 15
Adam 11-01-14 49 10
Adam 11-01-19 40 20
Adam 11-01-21 40 10")
DT[, Date := as.Date(Date, "%y-%m-%d")]
DT[DT[, .(Date=seq(min(Date), max(Date), by="1 day")), by=.(Name)],
on=.(Name, Date)][,
':=' (
Quiz = ifelse(is.na(Quiz), 0, Quiz),
Homework = ifelse(is.na(Homework), 0, Homework)
)]
Explanation: 说明:
allDates <- DT[, .(Date=seq(min(Date), max(Date), by="1 day")), by=.(Name)]
使用allDates <- DT[, .(Date=seq(min(Date), max(Date), by="1 day")), by=.(Name)]
创建日期序列 DT[allDates, on=.(Name, Date)]
使用DT[allDates, on=.(Name, Date)]
加入原始数据集 A tidyverse
solution: tidyverse
解决方案:
library(dplyr)
library(tidyr)
library(lubridate) # for easier year conversion
df1 <- structure(list(Name = c("John", "John", "John", "John", "John",
"John", "Adam", "Adam", "Adam", "Adam", "Adam"),
Date = c("11-01-02", "11-01-03", "11-01-04",
"11-01-08", "11-01-10", "11-01-13",
"11-01-05", "11-01-08", "11-01-14",
"11-01-19", "11-01-21"),
Quiz = c(40L, 47L, 41L, 35L, 43L, 40L, 41L, 41L, 49L, 40L, 40L),
Homework = c(10L, 20L, 10L, 10L, 15L, 10L,
10L, 15L, 10L, 20L, 10L)),
.Names = c("Name", "Date", "Quiz", "Homework"),
class = "data.frame",
row.names = c(NA, -11L))
df1 %>%
mutate(Date = as_date(Date, "%C-%m-%d")) %>%
group_by(Name) %>%
complete(Date = seq(min(Date), max(Date), by = "1 day"),
fill = list(Quiz = 0, Homework = 0))
Name Date Quiz Homework
1 Adam 2011-01-05 41 10
2 Adam 2011-01-06 0 0
3 Adam 2011-01-07 0 0
4 Adam 2011-01-08 41 15
5 Adam 2011-01-09 0 0
6 Adam 2011-01-10 0 0
7 Adam 2011-01-11 0 0
8 Adam 2011-01-12 0 0
9 Adam 2011-01-13 0 0
10 Adam 2011-01-14 49 10
11 Adam 2011-01-15 0 0
12 Adam 2011-01-16 0 0
13 Adam 2011-01-17 0 0
14 Adam 2011-01-18 0 0
15 Adam 2011-01-19 40 20
16 Adam 2011-01-20 0 0
17 Adam 2011-01-21 40 10
18 John 2011-01-02 40 10
19 John 2011-01-03 47 20
20 John 2011-01-04 41 10
21 John 2011-01-05 0 0
22 John 2011-01-06 0 0
23 John 2011-01-07 0 0
24 John 2011-01-08 35 10
25 John 2011-01-09 0 0
26 John 2011-01-10 43 15
27 John 2011-01-11 0 0
28 John 2011-01-12 0 0
29 John 2011-01-13 40 10
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.