简体   繁体   English

将列中的值与R中的组中的多个值进行比较

[英]compare values of a column with multiple values in another column by group in R

I have a data of savings accounts of individuals and I observe the amount of agreement, as well as opening and closing dates. 我有个人储蓄账户的数据,我观察了协议的数量,以及开始和结束日期。 Here is the savings data of a consumer: 以下是消费者的储蓄数据:

amount <- c(1004, 1004, 1240, 1039, 1240, 1039, 1039, 1240, 1040, 1040)  
opening <- as.Date(c('2012-11-19', '2013-05-20', '2014-06-13', '2015-05-26',
    '2015-06-13', '2015-11-26', '2016-05-26', '2016-06-13', '2016-11-26',
    '2017-05-26'))  
closing <- as.Date(c('2013-05-20', '2013-11-20', '2015-06-13', '2015-11-26',
    '2016-06-13', '2016-05-26', '2016-11-26', '2017-06-13', '2017-05-26',
    '2017-07-10'))

dt <- data.frame(amount, opening, closing) 
   amount    opening    closing
     1004 2012-11-19 2013-05-20
     1004 2013-05-20 2013-11-20
     1240 2014-06-13 2015-06-13
     1039 2015-05-26 2015-11-26
     1240 2015-06-13 2016-06-13
     1039 2015-11-26 2016-05-26
     1039 2016-05-26 2016-11-26
     1240 2016-06-13 2017-06-13
     1040 2016-11-26 2017-05-26
     1040 2017-05-26 2017-07-10

My task is the following: I want to identify all the accounts that have been rolled over. 我的任务如下:我想识别所有已滚过的帐户。 In other words, I want to track all the savings amounts through time and see whether the consumer closed the account and reopened it on the same day (automatic renewal of a savings account). 换句话说,我希望通过时间跟踪所有节省金额,并查看消费者是否关闭了帐户并在同一天重新打开(自动续订储蓄帐户)。 For example, on 2015-05-26 the consumer opened an account of 1039$, then rolled it over on 2015-11-26 and then again on 2016-05-26, then on 2016-11-26 (1040$) and finally on 2017-05-26 (1040$). 例如,在2015-05-26,消费者开设了一个1039美元的账户,然后在2015-11-26,然后再在2016-05-26,然后再在2016-11-26(1040 $)和终于在2017-05-26(1040 $)。

I can identify those accounts with ifelse(dt$opening %in% dt$closing, 1, 0) , but this obviously is not enough. 我可以用ifelse(dt$opening %in% dt$closing, 1, 0)识别那些账户ifelse(dt$opening %in% dt$closing, 1, 0) ,但这显然是不够的。 I am not sure how to proceed and what the usual methodology is in such cases (I wonder if replicating the entire data set would be a good start). 我不知道如何继续以及在这种情况下通常的方法是什么(我想知道复制整个数据集是否是一个好的开始)。

The final goal is to find out if someone has contributed to the savings amount or decreased it when rolling over the account. 最终目标是找出是否有人为节省金额做出了贡献,或者在滚动帐户时减少了节省金额。

Hope this is clear enough. 希望这很清楚。 Any help is very much appreciated! 很感谢任何形式的帮助!

You can identify rows whose closing date equals the opening date of another row with the same amount value using a self-join. 您可以使用自联接标识closing日期等于具有相同amount值的另一行的opening日期的行。 In the output below they will be the rows with non-missing rollover_opening . 在下面的输出中,它们将是没有丢失rollover_opening的行。 To answer the actual question you're asking the data would need to contain more info. 要回答实际问题,您要求数据需要包含更多信息。

library(data.table)
setDT(dt)

dt[dt, on = .(amount, closing = opening), rollover_opening := i.opening]

dt
#     amount    opening    closing rollover_opening
#  1:   1004 2012-11-19 2013-05-20       2013-05-20
#  2:   1004 2013-05-20 2013-11-20             <NA>
#  3:   1240 2014-06-13 2015-06-13       2015-06-13
#  4:   1039 2015-05-26 2015-11-26       2015-11-26
#  5:   1240 2015-06-13 2016-06-13       2016-06-13
#  6:   1039 2015-11-26 2016-05-26       2016-05-26
#  7:   1039 2016-05-26 2016-11-26             <NA>
#  8:   1240 2016-06-13 2017-06-13             <NA>
#  9:   1040 2016-11-26 2017-05-26       2017-05-26
# 10:   1040 2017-05-26 2017-07-10             <NA>

Another option: 另外一个选项:

dt[dt, on = .(closing = opening), rollover_amount := i.amount][]
#     amount    opening    closing rollover_amount
#  1:   1004 2012-11-19 2013-05-20            1004
#  2:   1004 2013-05-20 2013-11-20              NA
#  3:   1240 2014-06-13 2015-06-13            1240
#  4:   1039 2015-05-26 2015-11-26            1039
#  5:   1240 2015-06-13 2016-06-13            1240
#  6:   1039 2015-11-26 2016-05-26            1039
#  7:   1039 2016-05-26 2016-11-26            1040
#  8:   1240 2016-06-13 2017-06-13              NA
#  9:   1040 2016-11-26 2017-05-26            1040
# 10:   1040 2017-05-26 2017-07-10              NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM