简体   繁体   English

在 R 中为日期范围添加缺失的月份

[英]Add missing months for a range of date in R

Say I have a data.frame as follows, each month has one entry of data:假设我有一个data.frame如下,每个月都有一个数据条目:

 df <- read.table(text="date,gmsl
2009-01-17,58.4         
2009-02-17,59.1         
2009-04-16,60.9         
2009-06-16,62.3         
2009-09-16,64.6         
2009-12-16,68.3",sep=",",header=TRUE)

##  > df
##         date gmsl
## 1 2009-01-17 58.4
## 2 2009-02-17 59.1
## 3 2009-04-16 60.9
## 4 2009-06-16 62.3
## 5 2009-09-16 64.6
## 6 2009-12-16 68.3

Just wondering how could I fill missing month with gmsl as NaN for date range from 2009-01 to 2009-12 ?只是想知道如何在2009-012009-12的日期范围内用gmsl作为NaN填充缺失的月份?

I have extracted year and month for date column by df$Month_Yr <- format(as.Date(df$date), "%Y-%m") .我已经通过df$Month_Yr <- format(as.Date(df$date), "%Y-%m")提取了日期列的年份和月份。

Here's a way to this with tidyr::complete这是tidyr::complete的一种方法

library(dplyr)

df %>%
  mutate(date = as.Date(date), 
         first_date = as.Date(format(date, "%Y-%m-01"))) %>%
  tidyr::complete(first_date = seq(min(first_date), max(first_date), "1 month"))


# A tibble: 12 x 3
#   first_date date        gmsl
#   <date>     <date>     <dbl>
# 1 2009-01-01 2009-01-17  58.4
# 2 2009-02-01 2009-02-17  59.1
# 3 2009-03-01 NA          NA  
# 4 2009-04-01 2009-04-16  60.9
# 5 2009-05-01 NA          NA  
# 6 2009-06-01 2009-06-16  62.3
# 7 2009-07-01 NA          NA  
# 8 2009-08-01 NA          NA  
# 9 2009-09-01 2009-09-16  64.6
#10 2009-10-01 NA          NA  
#11 2009-11-01 NA          NA  
#12 2009-12-01 2009-12-16  68.3

You can then decide which column to keep, either first_date or date or combine them both.然后,您可以决定保留哪一列, first_datedate或将两者结合起来。

data数据

df <- structure(list(date = structure(1:6, .Label = c("2009-01-17", 
"2009-02-17", "2009-04-16", "2009-06-16", "2009-09-16", "2009-12-16"
), class = "factor"), gmsl = c(58.4, 59.1, 60.9, 62.3, 64.6, 
68.3)), class = "data.frame", row.names = c(NA, -6L))

In base R you could match (using %in% ) the substr ings of a seq.Date .在基础 R 中,您可以match (使用%in%substrseq.Date

dt.match <- seq.Date(ISOdate(2009, 1, 1), ISOdate(2009, 12, 1), "month")
sub <- 
  cbind(date=substr(dt.match, 1, 10)[!substr(dt.match, 1, 7) %in% substr(dat$date, 1, 7)], 
        gmsl=NA)
merge(dat, sub, all=TRUE)
#          date gmsl
# 1  2009-01-17 58.4
# 2  2009-02-17 59.1
# 3  2009-03-01 <NA>
# 4  2009-04-16 60.9
# 5  2009-05-01 <NA>
# 6  2009-06-16 62.3
# 7  2009-07-01 <NA>
# 8  2009-08-01 <NA>
# 9  2009-09-16 64.6
# 10 2009-10-01 <NA>
# 11 2009-11-01 <NA>
# 12 2009-12-16 68.3

Data数据

dat <- structure(list(date = c("2009-01-17", "2009-02-17", "2009-04-16", 
"2009-06-16", "2009-09-16", "2009-12-16"), gmsl = c(58.4, 59.1, 
60.9, 62.3, 64.6, 68.3)), row.names = c(NA, -6L), class = "data.frame")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM