简体   繁体   中英

R Dataframe : Date range Operations - Subset those rows which falls under a certain date range

If you see my profile, all my questions are on dataframes and here is another!

I have a certain dataframe which is a result of a merge between Debit and Credit transactions

>head(allTxns)
   Cust_no    CreditDate   Credit     DebitDate   Debit
 1  12345     2014-10-01    200      2014-10-03    400
 2  12345     2014-10-01    200      2014-10-04    150
 3  12345     2014-10-01    200      2014-10-15    800     
 4  33344     2014-10-03    500      2014-10-04    50
 5  33344     2014-10-03    500      2014-10-05    504
 6  33344     2014-10-03    500      2014-10-06    332
 7  33344     2014-10-03    500      2014-10-08    56
 8  66554     2014-10-10    660      2014-10-04    150     
 9  66554     2014-10-10    660      2014-10-05    800
10  66554     2014-10-10    660      2014-10-11    400
11  66554     2014-10-10    660      2014-10-12    150
12  66554     2014-10-10    660      2014-10-13    800

My aim is to get those rows, where the DebitDate lies between 5 days of the CreditDate and hence I tried to subset the data, where I put the date range using the : operator

FiveDays <- allTxns$CreditDate+5 #Results in a vector which has date + 5 days

allTxns <- cbind(allTxns[1:2],FiveDays,allTxns[4:6]) #Adding the vector as a column of dataframe

newDf <- allTxns[allTxns$DebitDate %in% allTxns$CreditDate:allTxns$FiveDays]

In the above code, I'm getting the following logical error where only the first element is being used

Warning messages:
  1: In mer32$DepositDate:mer32$FiveDays2 :
      numerical expression has 3994 elements: only the first used
  2: In mer32$DepositDate:mer32$FiveDays2 :
      numerical expression has 3994 elements: only the first used

Hence my required output is getting limited only to the first Cust_no (12345) and not being applied to other rows. How do I make sure that the range condition gets applied to ALL the rows??

Incorrect Output

 >head(newDf)
 row.names  Cust_no    CreditDate   Credit     DebitDate   Debit
     1      12345     2014-10-01    200      2014-10-03    400
     2      12345     2014-10-01    200      2014-10-04    150
     4      33344     2014-10-03    500      2014-10-04    50
     5      33344     2014-10-03    500      2014-10-05    504
     6      33344     2014-10-03    500      2014-10-06    332
     7      33344     2014-10-03    500      2014-10-08    56
     8      66554     2014-10-10    660      2014-10-04    150     
     9      66554     2014-10-10    660      2014-10-05    800
    10      66554     2014-10-10    660      2014-10-11    400
    11      66554     2014-10-10    660      2014-10-12    150
    12      66554     2014-10-10    660      2014-10-13    800

Correct Output

 >head(newDf)
 row.names  Cust_no    CreditDate   Credit     DebitDate   Debit
     1      12345     2014-10-01    200      2014-10-03    400
     2      12345     2014-10-01    200      2014-10-04    150
     4      33344     2014-10-03    500      2014-10-04    50
     5      33344     2014-10-03    500      2014-10-05    504
     6      33344     2014-10-03    500      2014-10-06    332
     7      33344     2014-10-03    500      2014-10-08    56         
    10      66554     2014-10-10    660      2014-10-11    400
    11      66554     2014-10-10    660      2014-10-12    150
    12      66554     2014-10-10    660      2014-10-13    800

Try

 allTxns[with(allTxns , CreditDate < DebitDate & DebitDate <=FiveDays),]
 #    Cust_no CreditDate   FiveDays Credit  DebitDate Debit
 #1    12345 2014-10-01 2014-10-06    200 2014-10-03   400
 #2    12345 2014-10-01 2014-10-06    200 2014-10-04   150
 #4    33344 2014-10-03 2014-10-08    500 2014-10-04    50
 #5    33344 2014-10-03 2014-10-08    500 2014-10-05   504
 #6    33344 2014-10-03 2014-10-08    500 2014-10-06   332
 #7    33344 2014-10-03 2014-10-08    500 2014-10-08    56
 #10   66554 2014-10-10 2014-10-15    660 2014-10-11   400
 #11   66554 2014-10-10 2014-10-15    660 2014-10-12   150
 #12   66554 2014-10-10 2014-10-15    660 2014-10-13   800

This old questions already has an accepted answer. However, I noted that question and answer can be streamlined as it is not necessary to create the additional FiveDays column:

allTxns[with(allTxns, CreditDate <= DebitDate & DebitDate <= CreditDate + 5L), ]
  Cust_no CreditDate Credit DebitDate Debit 1 12345 2014-10-01 200 2014-10-03 400 2 12345 2014-10-01 200 2014-10-04 150 4 33344 2014-10-03 500 2014-10-04 50 5 33344 2014-10-03 500 2014-10-05 504 6 33344 2014-10-03 500 2014-10-06 332 7 33344 2014-10-03 500 2014-10-08 56 10 66554 2014-10-10 660 2014-10-11 400 11 66554 2014-10-10 660 2014-10-12 150 12 66554 2014-10-10 660 2014-10-13 800 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM