简体   繁体   中英

R Data.Table Merge Based on Effdt

I'm having trouble figuring out how to get started merging a transactional data table to a domain data table. I'd like to merge the distributions data table below to the department data table so I know the name of the department involved when the transaction occurred. What I'd ultimately like to end up with after the merge is a data table like such:

PayeeName  Department     PaymentDT   Amount
Bob        Modified Name  2016-01-01  5
Tracy      Payables       2015-01-01  34
Tom        Postal         2015-01-01  87

Here is some sample data that is similar to the format I am working with.

library(data.table)
dtDistributions <- data.table(PayeeName = c("Bob", "Tracy", "Tom"),
                          Department = factor(c("H229000", "H135000", "H047800")),
                          Amount = c(5, 34, 87),
                          PaymentDT = as.Date(c("2016-01-01", "2015-01-01", "2015-01-01")))

dtDepartments <- data.table(Department = factor(c("H229000", "H229000", "H229000", "H135000", "H047800")),
                        EffDT = as.Date(c("2019-01-01", "2012-01-01", "1901-01-01", "1901-01-01", "1901-01-01")),
                        Descr = c("Final Name","Modified Name","Original Name","Payables","Postal"))

I was able to find a solution, but it doesn't use data.table functionality so I hesitate to call this an answer to my original question but wanted to share in case it helps anybody else. I ended up using the sqldf library which allows you to write SQL syntax against existing dataframes. I'm more than willing to accept a data.table answer as my actual dataset is quite large and I imagine it would be a lot faster than my sqldf implementation.

library(sqldf)
joinString <- "SELECT A.PayeeName, B.Descr, A.PaymentDT, A.Amount
            FROM dtDistributions A, dtDepartments B
            WHERE A.DEPARTMENT = B.Department
            AND B.EffDT = (SELECT MAX(ED.EffDT)
                            FROM dtDepartments ED
                            WHERE B.Department = ED.Department
                            AND ED.EffDT <= A.PaymentDT)"

finalDT <- data.table(sqldf(joinString))

I slightly modified your code since 3 departments names were tied to the same department number. So I first altered your code there as follows:

dtDistributions <- data.table(PayeeName = c("Bob", "Tracy", "Tom"),
                              Department = factor(c("H229000", "H135000", "H047800")),
                              Amount = c(5, 34, 87),
                              PaymentDT = as.Date(c("2016-01-01", "2015-01-01", "2015-01-01")))

dtDepartments <- data.table(Department = factor(c("H229000", "H229001", "H229002", "H135000", "H047800")),
                            EffDT = as.Date(c("2019-01-01", "2012-01-01", "1901-01-01", "1901-01-01", "1901-01-01")),
                            Descr = c("Final Name","Modified Name","Original Name","Payables","Postal"))

I then proceeded to use the following command to merge the data tables:

dtDistributions = merge(dtDistributions, dtDepartments[, .(Department, Descr)])

You can always rename the columns and reorder them later. But this should do the merge trick.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM