[英]R - insert rows for missing days in data frame
我有一個數據框如下:
> head(train)
S D Date
1 1 1 2010-02-05
2 1 1 2010-02-12
3 1 1 2010-02-19
Date列每周只有一個日期,對於每個當前日期,我想在上述日期之后的所有缺失日插入6行。 所以結果如下:
> head(train)
S D Date
1 1 1 2010-02-05
1 1 1 2010-02-06 <- inserted
1 1 1 2010-02-07 <- inserted
1 1 1 2010-02-08 <- inserted
1 1 1 2010-02-09 <- inserted
1 1 1 2010-02-10 <- inserted
1 1 1 2010-02-11 <- inserted
2 1 1 2010-02-12
etc
可能有點矯枉過正,但重點是在“正確”日期的日期加入,然后填寫:
library(dplyr)
library(zoo)
train <- data.frame(D = 1:3, S = 4:6, Date = as.Date("2010-02-05") + 7*(1:3))
full.dates <- as.Date(min(train$Date):max(train$Date), origin = "1970-01-01")
db <- data.frame(Date = full.dates)
fixed <- left_join(db, train)
# Fill from top using zoo::na.locf
fixed[ ,c("D", "S")] <- na.locf(fixed[ ,c("D", "S")])
使用的另一種方式na.locf
封裝zoo
,在那里你創建一個zoo
的時間序列,並使用xout
的說法na.locf
。 xout
指定用於extra- / interpolation的日期范圍。
library(zoo)
# either convert raw data to zoo object
z <- read.zoo(text = "S D Date
1 1 1 2010-02-05
2 1 1 2010-02-12
3 1 1 2010-02-19", index.column = "Date")
# ...or convert your data frame to zoo
z <- zoo(x = df[ , c("S", "D")], order.by = df$Date)
# create a sequence of dates, from first to last date in original data
tt <- seq(from = min(index(z)), to = max(index(z)), by = "day")
# expand time series to 'tt', and replace each NA with the most recent non-NA prior to it
na.locf(z, xout = tt)
# S D
# 2010-02-05 1 1
# 2010-02-06 1 1
# 2010-02-07 1 1
# 2010-02-08 1 1
# 2010-02-09 1 1
# 2010-02-10 1 1
# 2010-02-11 1 1
# 2010-02-12 1 1
# 2010-02-13 1 1
# 2010-02-14 1 1
# 2010-02-15 1 1
# 2010-02-16 1 1
# 2010-02-17 1 1
# 2010-02-18 1 1
# 2010-02-19 1 1
作為一個傻瓜:-),
library(lubridate)
train
# D S date
# 1 1 2 2010-02-05
# 2 1 3 2010-02-12
ttmp<-train[1,]
for(j in 1:6) ttmp<-rbind(ttmp,train[1,])
for(j in 2:7) ttmp[j,3]<-ttmp[j-1,3]+ddays(1)
ttmp
# D S date
# 1 1 2 2010-02-05
# 2 1 2 2010-02-06
# 3 1 2 2010-02-07
# 4 1 2 2010-02-08
# 5 1 2 2010-02-09
# 6 1 2 2010-02-10
# 7 1 2 2010-02-11
newtrain<-rbind(train[1,],ttmp)
然后遍歷所有初始行並將它們全部rbind
。
您可以通過以下方式獲取缺失的行數:
nMiss <- diff(as.Date(train$Date))
然后,您可以重復data.frame的每一行相關的次數:
longTrain <- train[rep(1:nrow(train), times=c(nMiss, 1)),]
您可以生成以下行的日期偏移:
off <- unlist(lapply(c(nMiss,1)-1, seq, from=0)
longTrain$Date <- as.Date(longTrain$Date)+off
如果要在數據框的末尾添加額外的行,可以將c(nMiss, 1)
的常量1更改為相關數字。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.