简体   繁体   English

基于Effdt的R Data.Table合并

[英]R Data.Table Merge Based on Effdt

I'm having trouble figuring out how to get started merging a transactional data table to a domain data table. 我在弄清楚如何开始将事务性数据表合并到域数据表时遇到了麻烦。 I'd like to merge the distributions data table below to the department data table so I know the name of the department involved when the transaction occurred. 我想将下面的分布数据表合并到部门数据表中,以便我知道交易发生时所涉及部门的名称。 What I'd ultimately like to end up with after the merge is a data table like such: 合并后我最终想要得到的是一个像这样的数据表:

PayeeName  Department     PaymentDT   Amount
Bob        Modified Name  2016-01-01  5
Tracy      Payables       2015-01-01  34
Tom        Postal         2015-01-01  87

Here is some sample data that is similar to the format I am working with. 这是一些示例数据,与我正在使用的格式相似。

library(data.table)
dtDistributions <- data.table(PayeeName = c("Bob", "Tracy", "Tom"),
                          Department = factor(c("H229000", "H135000", "H047800")),
                          Amount = c(5, 34, 87),
                          PaymentDT = as.Date(c("2016-01-01", "2015-01-01", "2015-01-01")))

dtDepartments <- data.table(Department = factor(c("H229000", "H229000", "H229000", "H135000", "H047800")),
                        EffDT = as.Date(c("2019-01-01", "2012-01-01", "1901-01-01", "1901-01-01", "1901-01-01")),
                        Descr = c("Final Name","Modified Name","Original Name","Payables","Postal"))

I was able to find a solution, but it doesn't use data.table functionality so I hesitate to call this an answer to my original question but wanted to share in case it helps anybody else. 我能够找到一个解决方案,但是它不使用data.table功能,因此我犹豫将其称为我的原始问题的答案,但想分享一下,以防万一。 I ended up using the sqldf library which allows you to write SQL syntax against existing dataframes. 我最终使用了sqldf库,该库允许您针对现有数据帧编写SQL语法。 I'm more than willing to accept a data.table answer as my actual dataset is quite large and I imagine it would be a lot faster than my sqldf implementation. 我非常愿意接受data.table答案,因为我的实际数据集非常大,而且我想它会比我的sqldf实现快很多。

library(sqldf)
joinString <- "SELECT A.PayeeName, B.Descr, A.PaymentDT, A.Amount
            FROM dtDistributions A, dtDepartments B
            WHERE A.DEPARTMENT = B.Department
            AND B.EffDT = (SELECT MAX(ED.EffDT)
                            FROM dtDepartments ED
                            WHERE B.Department = ED.Department
                            AND ED.EffDT <= A.PaymentDT)"

finalDT <- data.table(sqldf(joinString))

I slightly modified your code since 3 departments names were tied to the same department number. 由于3个部门的名称与相同的部门号相关联,因此我对您的代码进行了少许修改。 So I first altered your code there as follows: 因此,我首先在此处更改了您的代码,如下所示:

dtDistributions <- data.table(PayeeName = c("Bob", "Tracy", "Tom"),
                              Department = factor(c("H229000", "H135000", "H047800")),
                              Amount = c(5, 34, 87),
                              PaymentDT = as.Date(c("2016-01-01", "2015-01-01", "2015-01-01")))

dtDepartments <- data.table(Department = factor(c("H229000", "H229001", "H229002", "H135000", "H047800")),
                            EffDT = as.Date(c("2019-01-01", "2012-01-01", "1901-01-01", "1901-01-01", "1901-01-01")),
                            Descr = c("Final Name","Modified Name","Original Name","Payables","Postal"))

I then proceeded to use the following command to merge the data tables: 然后,我继续使用以下命令来合并数据表:

dtDistributions = merge(dtDistributions, dtDepartments[, .(Department, Descr)])

You can always rename the columns and reorder them later. 您始终可以重命名列,以后再重新排序。 But this should do the merge trick. 但这应该可以解决合并技巧。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM