简体   繁体   中英

In R, I have various datasets based on dates (rows). I would like to merge the rows by date and combine the data

I have multiple years of daily data for various variables. Some variables have random missing days. I would like to merge each dataset into rows by matching date so that all the variables are collated for a given date into one row. There are many years of data and many variables so cannot be done manually. Below example has 3 such datasets and an example collated that I am trying to achieve. You can see there are missing dates in each DF which would appear as NA in the collated dataframe.

DF1: Date         Var         Var
     01.01.2001   57685       574849
     02.01.2001   57342       577890
     04.01.2001   44332       574849
     05.01.2001   57321       574849
     ..........   ....        ......

DF2: Date         Var A       Var B
     01.01.2001   abnns       jjkall
     03.01.2001   bbaas       abnns       
     04.01.2001   jjkall      574849
     05.01.2001   57321       jjkall
     ..........   ....        ......

DF3: Date         Var K9      Var M8
     02.01.2001   ab221       jjk112
     03.01.2001   bb445       ab345      
     04.01.2001   jjk567      574rtg9
     05.01.2001   573fda      jjk243
     ..........   ....        ......

COLLATED:
Date       Var 1 Var 2  Var A Var B  Var K9 Var M8 
01.01.2001 57685 574849 abnns jjkall NA     NA 
02.01.2001 57342 577890 NA    NA     ab221  jjk112 
03.01.2001 NA    NA     bbaas abnns  bb445  ab345
04.01.2001 44332 574849 jjkal 574849 jjk567 574rtg9 
05.01.2001 57321 574849 57321 jjkall 573fda jjk243 
.......... .... ...... ...... ...... ...... ......

Here's some sample code for you that should do the trick. The idea is to combine the dataframes into a list (like the comment above mentioned) and use the reduce command to join them all together. You don't need to use a loop or merge them manually.

library(lubridate)
library(purrr)
library(dplyr)

## create some random dataframes
all_dates <- seq(ymd('2020-01-01'), ymd('2020-01-31'), 'days')
df1 <- data_frame(date = sample(all_dates, 10), v1 = rnorm(10))
df2 <- data_frame(date = sample(all_dates, 10), v2 = rnorm(10))
df3 <- data_frame(date = sample(all_dates, 10), v3 = rnorm(10))

## combine them into a list
df_list <- list(df1, df2, df3)

## use the reduce command to turn into one dataframe
df_all <- purrr::reduce(df_list, full_join, by = 'date') %>% 
  arrange(date)

df_all

# A tibble: 22 x 4
   date           v1     v2      v3
   <date>      <dbl>  <dbl>   <dbl>
 1 2020-01-01  0.780  1.53   0.406 
 2 2020-01-02  0.476 NA     NA     
 3 2020-01-03 NA     -0.555 NA     
 4 2020-01-06 NA     -0.972 NA     
 5 2020-01-07 -1.06  NA     NA     
 6 2020-01-08 NA     NA      0.806 
 7 2020-01-09 NA     NA     -0.0956
 8 2020-01-10 -1.28  NA     NA     
 9 2020-01-11 NA     -0.315  1.35  
10 2020-01-12  0.505  0.325 NA     
# ... with 12 more rows

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM