简体   繁体   中英

How to merge data frame with multiple columns with R?

i have made a data frame consisting of codes with following code:

dates <- as.character(seq(as.Date('2015-01-01'),as.Date('2019-06-30'),by = "1 
day"))
dates <- as.data.frame(dates)
dates$date_general_ledger <- ymd(dates$date_general_ledger)

The data frame looks like this:

 `date_general_ledger
1          2015-01-01
2          2015-01-02
3          2015-01-03
4          2015-01-04
5          2015-01-05
6          2015-01-06

Then i have data frame consisting dates , account_id and value

`# A tibble: 6 x 3
  account_id date_general_ledger     amount
       <int> <chr>                    <dbl>
1     A      2015-01-01                 110
2     A      2015-01-03                 200
3     B      2015-01-02                  50

I am trying below code to merge

  `dates %>%
  left_join(df3, by="account_id") 

It is not bring 2015-01-02 value as "NA" against "A" because it is not considering account_id in the joining.

The big issue here is that you have an invalid join. When you join two data data.frames, the column being joined by must be in both data.frames. The dates data.frame does not contain the account_ID column, so the join you have will not work. The only column they have in common is date_general_ledger , so you could join by that.

dates %>%
  left_join(df3, by="date_general_ledger") %>%
  tibble()
# A tibble: 1,642 x 3
   date_general_ledger account_id amount
   <date>              <chr>       <dbl>
 1 2015-01-01          A             110
 2 2015-01-02          B              50
 3 2015-01-03          A             200
 4 2015-01-04          NA             NA
 5 2015-01-05          NA             NA
 6 2015-01-06          NA             NA
 7 2015-01-07          NA             NA
 8 2015-01-08          NA             NA
 9 2015-01-09          NA             NA
10 2015-01-10          NA             NA
# ... with 1,632 more rows

df3 %>%
  full_join(dates, by="date_general_ledger") %>%
  tibble()
# A tibble: 1,642 x 3
   account_id date_general_ledger amount
   <chr>      <date>               <dbl>
 1 A          2015-01-01             110
 2 A          2015-01-03             200
 3 B          2015-01-02              50
 4 NA         2015-01-04              NA
 5 NA         2015-01-05              NA
 6 NA         2015-01-06              NA
 7 NA         2015-01-07              NA
 8 NA         2015-01-08              NA
 9 NA         2015-01-09              NA
10 NA         2015-01-10              NA
# ... with 1,632 more rows

Are either of these what you are looking for? If not, then your dates data.frame needs to have another column.

There are also some minor issues with the preparation of your dates data.frame.

# This code throws an error. You should set by = "day"
dates <- as.character(seq(as.Date('2015-01-01'), as.Date('2019-06-30'), by = "1 
day"))

# This code also throws an error because that column was not defined in the data.frame.
dates$date_general_ledger <- ymd(dates$date_general_ledger)

The following code works to set up dates .

dates <- as.character(seq(as.Date('2015-01-01'),as.Date('2019-06-30'),by = "day"))
dates <- data.frame(date_general_ledger = ymd(dates))

Here is code to set up the abbreviated df3

df3 <- tibble(account_id = LETTERS[1:3], 
              date_general_ledger = ymd(c("2015-01-01", "2015-01-03", "2015-01-02")),
              amount = c(110, 200, 50))

We can use complete from tidyr :

result <- tidyr::complete(df3, account_id, 
                          date_general_ledger = dates$date_general_ledger, 
                          fill = list(amount = 0))

# A tibble: 12 x 3
#   account_id date_general_ledger amount
#   <chr>      <chr>                <dbl>
# 1 A          2015-01-01             110
# 2 A          2015-01-02               0
# 3 A          2015-01-03             200
# 4 A          2015-01-04               0
# 5 A          2015-01-05               0
# 6 A          2015-01-06               0
# 7 B          2015-01-01               0
# 8 B          2015-01-02              50
# 9 B          2015-01-03               0
#10 B          2015-01-04               0
#11 B          2015-01-05               0
#12 B          2015-01-06               0

data

df3 <- structure(list(account_id = c("A", "A", "B"), 
date_general_ledger = c("2015-01-01","2015-01-03", "2015-01-02"), 
amount = c(110L, 200L, 50L)), class = "data.frame", row.names = c(NA,-3L))

dates <- structure(list(date_general_ledger = c("2015-01-01", "2015-01-02", 
"2015-01-03", "2015-01-04", "2015-01-05", "2015-01-06")), 
class = "data.frame", row.names = c(NA, -6L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM