I have multiple years of daily data for various variables. Some variables have random missing days. I would like to merge each dataset into rows by matching date so that all the variables are collated for a given date into one row. There are many years of data and many variables so cannot be done manually. Below example has 3 such datasets and an example collated that I am trying to achieve. You can see there are missing dates in each DF which would appear as NA in the collated dataframe.
DF1: Date Var Var
01.01.2001 57685 574849
02.01.2001 57342 577890
04.01.2001 44332 574849
05.01.2001 57321 574849
.......... .... ......
DF2: Date Var A Var B
01.01.2001 abnns jjkall
03.01.2001 bbaas abnns
04.01.2001 jjkall 574849
05.01.2001 57321 jjkall
.......... .... ......
DF3: Date Var K9 Var M8
02.01.2001 ab221 jjk112
03.01.2001 bb445 ab345
04.01.2001 jjk567 574rtg9
05.01.2001 573fda jjk243
.......... .... ......
COLLATED:
Date Var 1 Var 2 Var A Var B Var K9 Var M8
01.01.2001 57685 574849 abnns jjkall NA NA
02.01.2001 57342 577890 NA NA ab221 jjk112
03.01.2001 NA NA bbaas abnns bb445 ab345
04.01.2001 44332 574849 jjkal 574849 jjk567 574rtg9
05.01.2001 57321 574849 57321 jjkall 573fda jjk243
.......... .... ...... ...... ...... ...... ......
Here's some sample code for you that should do the trick. The idea is to combine the dataframes into a list (like the comment above mentioned) and use the reduce command to join them all together. You don't need to use a loop or merge them manually.
library(lubridate)
library(purrr)
library(dplyr)
## create some random dataframes
all_dates <- seq(ymd('2020-01-01'), ymd('2020-01-31'), 'days')
df1 <- data_frame(date = sample(all_dates, 10), v1 = rnorm(10))
df2 <- data_frame(date = sample(all_dates, 10), v2 = rnorm(10))
df3 <- data_frame(date = sample(all_dates, 10), v3 = rnorm(10))
## combine them into a list
df_list <- list(df1, df2, df3)
## use the reduce command to turn into one dataframe
df_all <- purrr::reduce(df_list, full_join, by = 'date') %>%
arrange(date)
df_all
# A tibble: 22 x 4
date v1 v2 v3
<date> <dbl> <dbl> <dbl>
1 2020-01-01 0.780 1.53 0.406
2 2020-01-02 0.476 NA NA
3 2020-01-03 NA -0.555 NA
4 2020-01-06 NA -0.972 NA
5 2020-01-07 -1.06 NA NA
6 2020-01-08 NA NA 0.806
7 2020-01-09 NA NA -0.0956
8 2020-01-10 -1.28 NA NA
9 2020-01-11 NA -0.315 1.35
10 2020-01-12 0.505 0.325 NA
# ... with 12 more rows
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.