简体   繁体   English

在R中将每月数据转换为每日数据

[英]Converting Monthly Data to Daily in R

I have a data.frame df that has monthly data: 我有一个data.frame df,它有月度数据:

Date           Value 
2008-01-01      3.5          
2008-02-01      9.5          
2008-03-01      0.1          

I want there to be data on every day in the month (and I will assume Value does not change during each month) since I will be merging this into a different table that has monthly data. 我希望在那个月的每一天都有数据(我会假设每个月的价值不会改变),因为我将把它合并到一个具有月度数据的不同表中。

I want the output to look like this: 我希望输出看起来像这样:

Date           Value 
2008-01-02      3.5
2008-01-03      3.5 
2008-01-04      3.5 
2008-01-05      3.5 
2008-01-06      3.5 
2008-01-07      3.5 
2008-01-08      3.5 
2008-01-09      3.5 
2008-01-10      3.5 
2008-01-11      3.5 
2008-01-12      3.5 
2008-01-13      3.5 
2008-01-14      3.5 
2008-01-15      3.5 
2008-01-16      3.5 
2008-01-17      3.5 
2008-01-18      3.5 
2008-01-19      3.5 
2008-01-20      3.5 
2008-01-21      3.5 
2008-01-22      3.5 
2008-01-23      3.5 
2008-01-24      3.5
2008-01-25      3.5 
2008-01-26      3.5 
2008-01-27      3.5 
2008-01-28      3.5 
2008-01-29      3.5 
2008-01-30      3.5  
2008-01-31      3.5        
2008-02-01      9.5           

I have tried to.daily but my call: 我试过to.daily但是我的电话:

df <- to.daily(df$Date)

returns 回报

Error in to.period(x, "days", name = name, ...) : 'x' contains no data

Not sure if i understood perfectly but i think something like this may work. 不确定我是否完全理解,但我认为这样的事情可能有用。

First, i define the monthly data table 首先,我定义月度数据表

library(data.table)

DT_month=data.table(Date=as.Date(c("2008-01-01","2008-02-01","2008-03-01","2008-05-01","2008-07-01"))
              ,Value=c(3.5,9.5,0.1,5,8))

Then, you have to do the following 然后,您必须执行以下操作

DT_month[,Month:=month(Date)]
DT_month[,Year:=year(Date)]

start_date=min(DT_month$Date)
end_date=max(DT_month$Date)

DT_daily=data.table(Date=seq.Date(start_date,end_date,by="day"))
DT_daily[,Month:=month(Date)]
DT_daily[,Year:=year(Date)]
DT_daily[,Value:=-100]

for( i in unique(DT_daily$Year)){
  for( j in unique(DT_daily$Month)){
    if(length(DT_month[Year==i & Month== j,Value])!=0){
      DT_daily[Year==i & Month== j,Value:=DT_month[Year==i & Month== j,Value]]
    }
  }
}

Basically, the code will define the month and year of each monthly value in separate columns. 基本上,代码将在单独的列中定义每个月值的月份和年份。

Then, it will create a vector of daily data using the minimum and maximum dates in your monthly data, and will create two separate columns for year and month for the daily data as well. 然后,它将使用月度数据中的最小和最大日期创建每日数据向量,并为日常数据创建两个单独的年份和月份列。

Finally, it goes through every combination of year and months of data filling the daily values with the monthly ones. 最后,它通过每月和每月的数据填充每日价值与每月的数据。 In case there is no data for certain combination of month and year, it will show a -100. 如果没有月份和年份的特定组合的数据,它将显示-100。

Please let me know if it works. 如果有效,请告诉我。

to.daily can only be applied to xts/zoo objects and can only convert to a LOWER frequency. to.daily只能应用于xts/zoo对象,并且只能转换为LOWER频率。 ie from daily to monthly, but not the other way round. 即从每日到每月,但不是相反。 One easy way to accomplish what you want is converting df to an xts object: 一个简单的方法来实现你想要的是将df转换为xts对象:

df.xts <- xts(df$Value,order.by = df$Date)

And merge, like so: 合并,如下:

na.locf(merge(df.xts, foo=zoo(NA, order.by=seq(start(df.xts), end(df.xts),
  "day",drop=F)))[, 1])
               df.xts
2018-01-01    3.5
2018-01-02    3.5
2018-01-03    3.5
2018-01-04    3.5
2018-01-05    3.5
2018-01-06    3.5
2018-01-07    3.5
….
2018-01-27    3.5
2018-01-28    3.5
2018-01-29    3.5
2018-01-30    3.5
2018-01-31    3.5
2018-02-01    9.5
2018-02-02    9.5
2018-02-03    9.5
2018-02-04    9.5
2018-02-05    9.5
2018-02-06    9.5
2018-02-07    9.5
2018-02-08    9.5
….
2018-02-27    9.5
2018-02-28    9.5
2018-03-01    0.1

If you want to adjust the price continuously over the course of a month use na.spline in place of na.locf . 如果您想在一个月内连续调整价格,请使用na.spline代替na.locf

Maybe not an efficient one but with base R we can do 也许不是一个有效的,但我们可以做基础R.

do.call("rbind", lapply(1:nrow(df), function(i) 
data.frame(Date = seq(df$Date[i], 
                  (seq(df$Date[i],length=2,by="months") - 1)[2], by = "1 days"), 
                  value = df$Value[i])))

We basically generate a sequence of dates from start_date to the last day of that month which is calculated by 我们基本上生成一个从start_date到该月最后一天的日期sequence ,计算方法是

seq(df$Date[i],length=2,by="months") - 1)[2]

and repeat the same value for all the dates and put them in the data frame. 并为所有日期重复相同的value并将它们放在数据框中。

We get a list of dataframe and then we can rbind them using do.call . 我们得到一个数据帧列表,然后我们可以使用do.callrbind它们。

An option using tidyr::expand expand a row between 1st day of month to last day of month. 使用tidyr::expand的选项会在月的第一天到月的最后一天之间展开一行。 The lubridate::floor_date can provide 1st day of month and lubridate::ceiling_date() - days(1) will provide last day of month. lubridate::floor_date可以提供一个月的第一天,而lubridate::ceiling_date() - days(1)将提供一个月的最后一天。

library(tidyverse)
library(lubridate)

df %>% mutate(Date = ymd(Date)) %>%
group_by(Date) %>%
expand(Date = seq(floor_date(Date, unit = "month"),
       ceiling_date(Date, unit="month")-days(1), by="day"), Value) %>%
as.data.frame()

#          Date Value
# 1  2008-01-01   3.5
# 2  2008-01-02   3.5
# 3  2008-01-03   3.5
# 4  2008-01-04   3.5
# 5  2008-01-05   3.5
#.....so on
# 32 2008-02-01   9.5
# 33 2008-02-02   9.5
# 34 2008-02-03   9.5
# 35 2008-02-04   9.5
# 36 2008-02-05   9.5
#.....so on

# 85 2008-03-25   0.1
# 86 2008-03-26   0.1
# 87 2008-03-27   0.1
# 88 2008-03-28   0.1
# 89 2008-03-29   0.1
# 90 2008-03-30   0.1
# 91 2008-03-31   0.1

Data: 数据:

df <- read.table(text = 
"Date           Value 
2008-01-01      3.5          
2008-02-01      9.5          
2008-03-01      0.1",
header = TRUE, stringsAsFactors = FALSE)

Another way: 其他方式:

library(lubridate)

d <- read.table(text = "Date           Value 
                2008-01-01      3.5          
                2008-02-01      9.5          
                2008-03-01      0.1",
                stringsAsFactors = FALSE, header = TRUE)

Dates <- seq(from = min(as.Date(d$Date)),
             to = ceiling_date(max(as.Date(d$Date)), "month") - days(1),
             by = "1 days")

data.frame(Date = Dates,
           Value = setNames(d$Value, d$Date)[format(Dates, format = "%Y-%m-01")])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM