[英]Expand dataframe with sequential dates based on a column of dates in R
我想基於“日期”列擴展數據框架,以便在當前日期之間按時間順序排列新的日期行。 我的日期列是按時間順序排列的,跨5年,並且包含我想忽略的重復日期。 我希望新行的相應“組”和“繪制”行為“ NA”。
zz <- "Date Group Draw
1 2006-05-11 bb T
2 2006-05-11 bb F
3 2006-05-14 aa T
4 2006-05-16 aa T
5 2006-05-20 cc F
6 2006-05-20 bb F
7 2006-05-21 aa T"
Data <- read.table(text=zz, header = TRUE)
所以我希望我的新數據框看起來像這樣:
xx <- "Date Group Draw
1 2006-05-11 bb T
2 2006-05-11 bb F
3 2006-05-12 NA NA
4 2006-05-13 NA NA
5 2006-05-14 aa T
6 2006-05-15 NA NA
7 2006-05-16 aa T
8 2006-05-17 NA NA
9 2006-05-18 NA NA
10 2006-05-19 NA NA
11 2006-05-20 cc F
12 2006-05-20 bb F
13 2006-05-21 aa T"
Output <- read.table(text=xx, header = TRUE)
任何幫助將非常感激。 我是R的新手,並且一直在嘗試手動執行此操作。
我認為這應該很好:
merge(
x = data.frame(
Date = seq.Date(min(df$Date), max(df$Date), by = "day")
),
y = df,
all.x = TRUE
)
# Date Group Draw
# 1 2006-05-11 bb TRUE
# 2 2006-05-11 bb FALSE
# 3 2006-05-12 <NA> NA
# 4 2006-05-13 <NA> NA
# 5 2006-05-14 aa TRUE
# 6 2006-05-15 <NA> NA
# 7 2006-05-16 aa TRUE
# 8 2006-05-17 <NA> NA
# 9 2006-05-18 <NA> NA
# 10 2006-05-19 <NA> NA
# 11 2006-05-20 cc FALSE
# 12 2006-05-20 bb FALSE
# 13 2006-05-21 aa TRUE
所有這些操作就是創建一個跨越實際數據范圍的日期序列,然后執行左聯接。
和同樣的想法,使用data.table
:
dt[dt[,.(Date = seq.Date(min(Date), max(Date), by = "day"))], on = .(Date)]
# Date Group Draw
# 1: 2006-05-11 bb TRUE
# 2: 2006-05-11 bb FALSE
# 3: 2006-05-12 NA NA
# 4: 2006-05-13 NA NA
# 5: 2006-05-14 aa TRUE
# 6: 2006-05-15 NA NA
# 7: 2006-05-16 aa TRUE
# 8: 2006-05-17 NA NA
# 9: 2006-05-18 NA NA
# 10: 2006-05-19 NA NA
# 11: 2006-05-20 cc FALSE
# 12: 2006-05-20 bb FALSE
# 13: 2006-05-21 aa TRUE
zz <- "Date Group Draw
1 2006-05-11 bb T
2 2006-05-11 bb F
3 2006-05-14 aa T
4 2006-05-16 aa T
5 2006-05-20 cc F
6 2006-05-20 bb F
7 2006-05-21 aa T"
df <- read.table(
text = zz,
header = TRUE
)
df$Date <- as.Date(df$Date)
library(data.table)
dt <- data.table(read.table(text = zz, header = TRUE))[,Date := as.Date(Date)]
從@ nrussell的帖子使用數據,另一種選擇是complete
從tidyr
library(tidyr)
complete(df, Date = full_seq(Date, 1))
## A tibble: 13 × 3
# Date Group Draw
# <date> <fctr> <lgl>
#1 2006-05-11 bb TRUE
#2 2006-05-11 bb FALSE
#3 2006-05-12 NA NA
#4 2006-05-13 NA NA
#5 2006-05-14 aa TRUE
#6 2006-05-15 NA NA
#7 2006-05-16 aa TRUE
#8 2006-05-17 NA NA
#9 2006-05-18 NA NA
#10 2006-05-19 NA NA
#11 2006-05-20 cc FALSE
#12 2006-05-20 bb FALSE
#13 2006-05-21 aa TRUE
如果我正確理解了您的問題,這是我的粗略看法:
date <- format(seq.Date(from=as.Date(paste(2006, '05', '11', sep='-'),
'%Y-%m-%d'),
to =as.Date(paste(2006, 05, '21', sep='-'),
'%Y-%m-%d'),
by = "day"), '%Y-%m-%d')
上面生成了日期列表。 然后,您可以使用上述date
的左連接到data.table。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.