简体   繁体   English

使用组返回错误用 Pad 填充数据空白

[英]Fill data gaps with Pad using group returns error

I have time-series data that starts and ends during the calendar year and most fill functions (like pad, package padr) fill gaps between start and end dates.我有在日历年开始和结束的时间序列数据,大多数填充函数(如 pad、package padr)填充开始日期和结束日期之间的空白。 However I need a complete annual record.但是我需要一个完整的年度记录。 For example if my data start date is 2016-01-03 the desired result would be to extend the time series to the beginning of the year, or end of the year if end date occurs prior.例如,如果我的数据开始日期是 2016-01-03,则期望的结果是将时间序列扩展到年初,或者如果结束日期早于年末。 NA would be used to fill the gap. NA 将用于填补空白。

A solution that works on data with multiple sites is appreciated, hence the example below.一个适用于多个站点的数据的解决方案受到赞赏,因此下面的示例。

library(dplyr)
library(padr)

#Example datset

site<-"site_1"
date<-seq(as.Date('2016-01-03'),as.Date('2016-12-09'), by='day')
x <- runif(length(date),min=20,max=40)
df1<-data.frame(site,date,x)
df11<-df1[-c(2,3,4,5,6),]

site<-"site_2"
date<-seq(as.Date('2012-06-01'),as.Date('2012-10-25'), by='day')
x <- runif(length(date),min=30,max=40)
df2<-data.frame(site,date,x)
df22<-df2[-c(2,3,4,5,6),]

df<-rbind(df11,df22)

Attempt below results in error "start value is larger than the end value for all groups" I think the issue is that its not grouping.下面的尝试导致错误“所有组的起始值大于结束值”我认为问题在于它没有分组。

dfpad<-df%>%   
pad(group ='site',start_val=floor_date(df[1,2],unit="year"),
 end_val=(round_date(df[length(df$date),2], unit="year")-1))

Desired outcome期望的结果

dfgoal<- data.frame(date=seq(as.Date('2016-01-01'),as.Date('2016-01-10'), by='day'),
                x=c("NA","NA",21,"NA","NA","NA","NA","NA",20,22))
head(dfgoal,10)

This solution uses a for loop此解决方案使用 for 循环

Original Data原始数据

library(dplyr)
library(padr)
library(lubridate) 

#Example datset

site<-"site_1"
date<-seq(as.Date('2016-01-03'),as.Date('2016-12-09'), by='day')
x <- runif(length(date),min=20,max=40)
df1<-data.frame(site,date,x)
df11<-df1[-c(2,3,4,5,6),]

site<-"site_2"
date<-seq(as.Date('2012-06-01'),as.Date('2012-10-25'), by='day')
x <- runif(length(date),min=30,max=40)
df2<-data.frame(site,date,x)
df22<-df2[-c(2,3,4,5,6),]

df<-rbind(df11,df22)

Solution解决方案

sites_a<-as.vector(unique(df$site))

contiga_df<-data.frame()

for(i in 1:2){
  
  site1a<-subset(df, site==sites_a[i])
  
  siteresult<-site1a%>%
    pad(start_val=floor_date(site1a[1,2],unit="year"), 
        end_val=(round_date(site1a[length(site1a$date),2], unit="year")-1))
  siteresult$site<- replace_na(siteresult$site,sites_a[i])
  contiga_df<-rbind(contiga_df, siteresult)
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM