[英]How to insert past values in place of missing month values?
I have a dataframe with 2 columns (Date and Values).我有一个包含 2 列(日期和值)的数据框。 I would like the frequency of observations to be monthly.我希望观察的频率是每月一次。 However, I don't have observations for all the months.但是,我没有观察所有月份。 I want to add as a missing observation the value at time t-1.我想将时间 t-1 的值添加为缺失的观察值。
Here an example to make it clearer:这是一个更清楚的例子:
df <- data.frame(Date = c("2000-01-05", "2000-02-03", "2000-03-02", "2000-04-13", "2000-05-11", "2000-06-08", "2000-07-06", "2000-09-14", "2000-10-05", "2000-11-02", "2000-12-14", "2001-02-01", "2001-03-01", "2001-04-11", "2001-05-10", "2001-06-07", "2001-07-05", "2001-08-30", "2001-10-11", "2001-11-08", "2001-12-06", "2002-01-03", "2002-02-07", "2002-03-07", "2002-04-04", "2002-05-02", "2002-06-06", "2002-07-04", "2002-09-12", "2002-10-10"), Fit = c( 1.00000000, -1.00000000, -0.81612680, -0.42769496, 1.00000000, -1.50947974, -0.02154276, -1.47427092, -1.46782501, -1.17309887, -0.70347628, -1.93465483, -3.00667550, -1.55652236, -4.10292471, -1.10159442, -2.64296439, -2.03574462, -1.55986632, -1.73125990, -1.34045640, -2.01864867, -2.51081773, -3.07896217, -3.02724723, -0.76456774, -1.81459657, -2.13093106, -1.91543051, -1.31418467))
Date Fit
1 2000-01-05 1.00000000
2 2000-02-03 -1.00000000
3 2000-03-02 -0.81612680
4 2000-04-13 -0.42769496
5 2000-05-11 1.00000000
6 2000-06-08 -1.50947974
7 2000-07-06 -0.02154276
8 2000-09-14 -1.47427092
9 2000-10-05 -1.46782501
10 2000-11-02 -1.17309887
11 2000-12-14 -0.70347628
12 2001-02-01 -1.93465483
13 2001-03-01 -3.00667550
14 2001-04-11 -1.55652236
15 2001-05-10 -4.10292471
16 2001-06-07 -1.10159442
17 2001-07-05 -2.64296439
18 2001-08-30 -2.03574462
19 2001-10-11 -1.55986632
20 2001-11-08 -1.73125990
21 2001-12-06 -1.34045640
22 2002-01-03 -2.01864867
23 2002-02-07 -2.51081773
24 2002-03-07 -3.07896217
25 2002-04-04 -3.02724723
26 2002-05-02 -0.76456774
27 2002-06-06 -1.81459657
28 2002-07-04 -2.13093106
29 2002-09-12 -1.91543051
30 2002-10-10 -1.31418467
# by running:
lapply(split(df, format(as.Date(df$Date), "%Y")), function(x) month.name[setdiff(seq(12), as.numeric(format(as.Date(x$Date), "%m")))])
# you will be able to see the missing month to get monthly frequency
I want to get this:我想得到这个:
Date Fit
1 2000-01-05 1.00000000
2 2000-02-03 -1.00000000
3 2000-03-02 -0.81612680
4 2000-04-13 -0.42769496
5 2000-05-11 1.00000000
6 2000-06-08 -1.50947974
7 2000-07-06 -0.02154276
8 2000-08-06 -0.02154276
8 2000-09-14 -1.47427092
9 2000-10-05 -1.46782501
10 2000-11-02 -1.17309887
11 2000-12-14 -0.70347628
11 2000-01-15 -0.70347628
12 2001-02-01 -1.93465483
13 2001-03-01 -3.00667550
14 2001-04-11 -1.55652236
15 2001-05-10 -4.10292471
16 2001-06-07 -1.10159442
17 2001-07-05 -2.64296439
18 2001-08-30 -2.03574462
19 2001-09-30 -2.03574462
19 2001-10-11 -1.55986632
20 2001-11-08 -1.73125990
21 2001-12-06 -1.34045640
22 2002-01-03 -2.01864867
23 2002-02-07 -2.51081773
24 2002-03-07 -3.07896217
25 2002-04-04 -3.02724723
26 2002-05-02 -0.76456774
27 2002-06-06 -1.81459657
28 2002-07-04 -2.13093106
28 2002-08-04 -2.13093106
29 2002-09-12 -1.91543051
30 2002-10-10 -1.31418467
As you can see, every missing month has been replaced with the value of the previous month.如您所见,每个缺失的月份都被替换为上个月的值。
Can anyone help me do it?有人可以帮我做吗?
Thanks so much!非常感谢!
df <- data.frame(Date = c("2000-01-05", "2000-02-03", "2000-03-02", "2000-04-13", "2000-05-11", "2000-06-08", "2000-07-06", "2000-09-14", "2000-10-05", "2000-11-02", "2000-12-14", "2001-02-01", "2001-03-01", "2001-04-11", "2001-05-10", "2001-06-07", "2001-07-05", "2001-08-30", "2001-10-11", "2001-11-08", "2001-12-06", "2002-01-03", "2002-02-07", "2002-03-07", "2002-04-04", "2002-05-02", "2002-06-06", "2002-07-04", "2002-09-12", "2002-10-10"), Fit = c( 1.00000000, -1.00000000, -0.81612680, -0.42769496, 1.00000000, -1.50947974, -0.02154276, -1.47427092, -1.46782501, -1.17309887, -0.70347628, -1.93465483, -3.00667550, -1.55652236, -4.10292471, -1.10159442, -2.64296439, -2.03574462, -1.55986632, -1.73125990, -1.34045640, -2.01864867, -2.51081773, -3.07896217, -3.02724723, -0.76456774, -1.81459657, -2.13093106, -1.91543051, -1.31418467))
Suggested solution using dplyr
and tidyr
:使用dplyr
和tidyr
建议解决方案:
library(dplyr)
library(tidyr)
df <- df %>%
mutate(Date = as.Date(as.character(Date, "%Y-%m-%d")),
Month = format(Date, "%m"),
Year = format(Date, "%Y")) %>%
complete(Month = formatC(1:12, 1, flag=0), nesting(Year)) %>%
mutate(Date = if_else(is.na(Date), as.Date(paste(Year, Month, "1", sep="-"), "%Y-%m-%d"), Date))%>%
arrange(Date) %>%
select(Date, Fit) %>%
mutate(Fit = if_else(is.na(Fit), lag(Fit), Fit)) %>%
mutate(Fit = if_else(is.na(Fit), lag(Fit), Fit)) # Do this twice as a 'hackish' solution for the last value
Returns:返回:
Date Fit
1 2000-01-05 1.00000000
2 2000-02-03 -1.00000000
3 2000-03-02 -0.81612680
4 2000-04-13 -0.42769496
5 2000-05-11 1.00000000
6 2000-06-08 -1.50947974
7 2000-07-06 -0.02154276
8 2000-08-01 -0.02154276
9 2000-09-14 -1.47427092
10 2000-10-05 -1.46782501
11 2000-11-02 -1.17309887
12 2000-12-14 -0.70347628
13 2001-01-01 -0.70347628
14 2001-02-01 -1.93465483
15 2001-03-01 -3.00667550
16 2001-04-11 -1.55652236
17 2001-05-10 -4.10292471
18 2001-06-07 -1.10159442
19 2001-07-05 -2.64296439
20 2001-08-30 -2.03574462
21 2001-09-01 -2.03574462
22 2001-10-11 -1.55986632
23 2001-11-08 -1.73125990
24 2001-12-06 -1.34045640
25 2002-01-03 -2.01864867
26 2002-02-07 -2.51081773
27 2002-03-07 -3.07896217
28 2002-04-04 -3.02724723
29 2002-05-02 -0.76456774
30 2002-06-06 -1.81459657
31 2002-07-04 -2.13093106
32 2002-08-01 -2.13093106
33 2002-09-12 -1.91543051
34 2002-10-10 -1.31418467
35 2002-11-01 -1.31418467
36 2002-12-01 -1.31418467
I first formatted your data.frame, then created a helper df and joined this by a character string id.我首先格式化了你的 data.frame,然后创建了一个 helper df 并通过一个字符串 id 加入了它。
library(dplyr)
# Format to date and create an ID
df <- df %>%
mutate(Date = as.Date(Date, format = "%Y-%m-%d")) %>%
mutate(id = substr(as.character(Date),1,7 ))
# Create a sequence from you min and max dates in your original df.
# Also, add an ID column for the join.
df_helper <-
data.frame(Date= seq(min(as.Date(df$Date)), max(as.Date(df$Date)), by = "month")) %>%
mutate(id = substr(as.character(Date),1,7 ))
# Perform the join and fill
new_df <- df_helper %>% left_join(df, by ="id") %>%
select(Date.x, id, Fit) %>%
rename( Date = Date.x) %>%
fill(Fit)
Here's a solution in base R style using the lubridate
package:这是使用lubridate
包的基本 R 风格的lubridate
方案:
library(lubridate)
df$Date <- as.POSIXct(as.character(df$Date))
added <- df[which(month(df$Date[-nrow(df)]) != month(df$Date[-1]) - 1), ]
added$Date <- added$Date + months(1)
df <- rbind(df, added)
df <- df[order(df$Date),]
row.names(df) <- seq(nrow(df))
result:结果:
df
#> Date Fit
#> 1 2000-01-05 1.00000000
#> 2 2000-02-03 -1.00000000
#> 3 2000-03-02 -0.81612680
#> 4 2000-04-13 -0.42769496
#> 5 2000-05-11 1.00000000
#> 6 2000-06-08 -1.50947974
#> 7 2000-07-06 -0.02154276
#> 8 2000-08-06 -0.02154276
#> 9 2000-09-14 -1.47427092
#> 10 2000-10-05 -1.46782501
#> 11 2000-11-02 -1.17309887
#> 12 2000-12-14 -0.70347628
#> 13 2001-01-14 -0.70347628
#> 14 2001-02-01 -1.93465483
#> 15 2001-03-01 -3.00667550
#> 16 2001-04-11 -1.55652236
#> 17 2001-05-10 -4.10292471
#> 18 2001-06-07 -1.10159442
#> 19 2001-07-05 -2.64296439
#> 20 2001-08-30 -2.03574462
#> 21 2001-09-30 -2.03574462
#> 22 2001-10-11 -1.55986632
#> 23 2001-11-08 -1.73125990
#> 24 2001-12-06 -1.34045640
#> 25 2002-01-03 -2.01864867
#> 26 2002-01-06 -1.34045640
#> 27 2002-02-07 -2.51081773
#> 28 2002-03-07 -3.07896217
#> 29 2002-04-04 -3.02724723
#> 30 2002-05-02 -0.76456774
#> 31 2002-06-06 -1.81459657
#> 32 2002-07-04 -2.13093106
#> 33 2002-08-04 -2.13093106
#> 34 2002-09-12 -1.91543051
#> 35 2002-10-10 -1.31418467
Created on 2020-02-24 by the reprex package (v0.3.0)由reprex 包(v0.3.0) 于 2020 年 2 月 24 日创建
Another dplyr
+ tidyr
+ lubridate
solution:另一个dplyr
+ tidyr
+ lubridate
解决方案:
library(dplyr)
library(tidyr)
library(lubridate)
df %>%
mutate(Date = as.Date(Date),
month = floor_date(Date, "month")) %>%
right_join(tibble(month = seq.Date(min(.$month), max(.$month), by = "month"))) %>%
mutate(day = day(Date)) %>%
fill(Fit, day) %>%
mutate(Date = month %m+% days(day-1)) %>%
select(Date, Fit)
Result:结果:
Date Fit
1 2000-01-05 1.00000000
2 2000-02-03 -1.00000000
3 2000-03-02 -0.81612680
4 2000-04-13 -0.42769496
5 2000-05-11 1.00000000
6 2000-06-08 -1.50947974
7 2000-07-06 -0.02154276
8 2000-08-06 -0.02154276
9 2000-09-14 -1.47427092
10 2000-10-05 -1.46782501
11 2000-11-02 -1.17309887
12 2000-12-14 -0.70347628
13 2001-01-14 -0.70347628
14 2001-02-01 -1.93465483
15 2001-03-01 -3.00667550
16 2001-04-11 -1.55652236
17 2001-05-10 -4.10292471
18 2001-06-07 -1.10159442
19 2001-07-05 -2.64296439
20 2001-08-30 -2.03574462
21 2001-09-30 -2.03574462
22 2001-10-11 -1.55986632
23 2001-11-08 -1.73125990
24 2001-12-06 -1.34045640
25 2002-01-03 -2.01864867
26 2002-02-07 -2.51081773
27 2002-03-07 -3.07896217
28 2002-04-04 -3.02724723
29 2002-05-02 -0.76456774
30 2002-06-06 -1.81459657
31 2002-07-04 -2.13093106
32 2002-08-04 -2.13093106
33 2002-09-12 -1.91543051
34 2002-10-10 -1.31418467
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.