I'm having trouble figuring out how to get started merging a transactional data table to a domain data table. I'd like to merge the distributions data table below to the department data table so I know the name of the department involved when the transaction occurred. What I'd ultimately like to end up with after the merge is a data table like such:
PayeeName Department PaymentDT Amount
Bob Modified Name 2016-01-01 5
Tracy Payables 2015-01-01 34
Tom Postal 2015-01-01 87
Here is some sample data that is similar to the format I am working with.
library(data.table)
dtDistributions <- data.table(PayeeName = c("Bob", "Tracy", "Tom"),
Department = factor(c("H229000", "H135000", "H047800")),
Amount = c(5, 34, 87),
PaymentDT = as.Date(c("2016-01-01", "2015-01-01", "2015-01-01")))
dtDepartments <- data.table(Department = factor(c("H229000", "H229000", "H229000", "H135000", "H047800")),
EffDT = as.Date(c("2019-01-01", "2012-01-01", "1901-01-01", "1901-01-01", "1901-01-01")),
Descr = c("Final Name","Modified Name","Original Name","Payables","Postal"))
I was able to find a solution, but it doesn't use data.table functionality so I hesitate to call this an answer to my original question but wanted to share in case it helps anybody else. I ended up using the sqldf library which allows you to write SQL syntax against existing dataframes. I'm more than willing to accept a data.table answer as my actual dataset is quite large and I imagine it would be a lot faster than my sqldf implementation.
library(sqldf)
joinString <- "SELECT A.PayeeName, B.Descr, A.PaymentDT, A.Amount
FROM dtDistributions A, dtDepartments B
WHERE A.DEPARTMENT = B.Department
AND B.EffDT = (SELECT MAX(ED.EffDT)
FROM dtDepartments ED
WHERE B.Department = ED.Department
AND ED.EffDT <= A.PaymentDT)"
finalDT <- data.table(sqldf(joinString))
I slightly modified your code since 3 departments names were tied to the same department number. So I first altered your code there as follows:
dtDistributions <- data.table(PayeeName = c("Bob", "Tracy", "Tom"),
Department = factor(c("H229000", "H135000", "H047800")),
Amount = c(5, 34, 87),
PaymentDT = as.Date(c("2016-01-01", "2015-01-01", "2015-01-01")))
dtDepartments <- data.table(Department = factor(c("H229000", "H229001", "H229002", "H135000", "H047800")),
EffDT = as.Date(c("2019-01-01", "2012-01-01", "1901-01-01", "1901-01-01", "1901-01-01")),
Descr = c("Final Name","Modified Name","Original Name","Payables","Postal"))
I then proceeded to use the following command to merge the data tables:
dtDistributions = merge(dtDistributions, dtDepartments[, .(Department, Descr)])
You can always rename the columns and reorder them later. But this should do the merge trick.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.