If you see my profile, all my questions are on dataframes and here is another!
I have a certain dataframe which is a result of a merge between Debit and Credit transactions
>head(allTxns)
Cust_no CreditDate Credit DebitDate Debit
1 12345 2014-10-01 200 2014-10-03 400
2 12345 2014-10-01 200 2014-10-04 150
3 12345 2014-10-01 200 2014-10-15 800
4 33344 2014-10-03 500 2014-10-04 50
5 33344 2014-10-03 500 2014-10-05 504
6 33344 2014-10-03 500 2014-10-06 332
7 33344 2014-10-03 500 2014-10-08 56
8 66554 2014-10-10 660 2014-10-04 150
9 66554 2014-10-10 660 2014-10-05 800
10 66554 2014-10-10 660 2014-10-11 400
11 66554 2014-10-10 660 2014-10-12 150
12 66554 2014-10-10 660 2014-10-13 800
My aim is to get those rows, where the DebitDate lies between 5 days of the CreditDate and hence I tried to subset the data, where I put the date range using the :
operator
FiveDays <- allTxns$CreditDate+5 #Results in a vector which has date + 5 days
allTxns <- cbind(allTxns[1:2],FiveDays,allTxns[4:6]) #Adding the vector as a column of dataframe
newDf <- allTxns[allTxns$DebitDate %in% allTxns$CreditDate:allTxns$FiveDays]
In the above code, I'm getting the following logical error where only the first element is being used
Warning messages:
1: In mer32$DepositDate:mer32$FiveDays2 :
numerical expression has 3994 elements: only the first used
2: In mer32$DepositDate:mer32$FiveDays2 :
numerical expression has 3994 elements: only the first used
Hence my required output is getting limited only to the first Cust_no (12345) and not being applied to other rows. How do I make sure that the range condition gets applied to ALL the rows??
Incorrect Output
>head(newDf)
row.names Cust_no CreditDate Credit DebitDate Debit
1 12345 2014-10-01 200 2014-10-03 400
2 12345 2014-10-01 200 2014-10-04 150
4 33344 2014-10-03 500 2014-10-04 50
5 33344 2014-10-03 500 2014-10-05 504
6 33344 2014-10-03 500 2014-10-06 332
7 33344 2014-10-03 500 2014-10-08 56
8 66554 2014-10-10 660 2014-10-04 150
9 66554 2014-10-10 660 2014-10-05 800
10 66554 2014-10-10 660 2014-10-11 400
11 66554 2014-10-10 660 2014-10-12 150
12 66554 2014-10-10 660 2014-10-13 800
Correct Output
>head(newDf)
row.names Cust_no CreditDate Credit DebitDate Debit
1 12345 2014-10-01 200 2014-10-03 400
2 12345 2014-10-01 200 2014-10-04 150
4 33344 2014-10-03 500 2014-10-04 50
5 33344 2014-10-03 500 2014-10-05 504
6 33344 2014-10-03 500 2014-10-06 332
7 33344 2014-10-03 500 2014-10-08 56
10 66554 2014-10-10 660 2014-10-11 400
11 66554 2014-10-10 660 2014-10-12 150
12 66554 2014-10-10 660 2014-10-13 800
Try
allTxns[with(allTxns , CreditDate < DebitDate & DebitDate <=FiveDays),]
# Cust_no CreditDate FiveDays Credit DebitDate Debit
#1 12345 2014-10-01 2014-10-06 200 2014-10-03 400
#2 12345 2014-10-01 2014-10-06 200 2014-10-04 150
#4 33344 2014-10-03 2014-10-08 500 2014-10-04 50
#5 33344 2014-10-03 2014-10-08 500 2014-10-05 504
#6 33344 2014-10-03 2014-10-08 500 2014-10-06 332
#7 33344 2014-10-03 2014-10-08 500 2014-10-08 56
#10 66554 2014-10-10 2014-10-15 660 2014-10-11 400
#11 66554 2014-10-10 2014-10-15 660 2014-10-12 150
#12 66554 2014-10-10 2014-10-15 660 2014-10-13 800
This old questions already has an accepted answer. However, I noted that question and answer can be streamlined as it is not necessary to create the additional FiveDays
column:
allTxns[with(allTxns, CreditDate <= DebitDate & DebitDate <= CreditDate + 5L), ]
Cust_no CreditDate Credit DebitDate Debit 1 12345 2014-10-01 200 2014-10-03 400 2 12345 2014-10-01 200 2014-10-04 150 4 33344 2014-10-03 500 2014-10-04 50 5 33344 2014-10-03 500 2014-10-05 504 6 33344 2014-10-03 500 2014-10-06 332 7 33344 2014-10-03 500 2014-10-08 56 10 66554 2014-10-10 660 2014-10-11 400 11 66554 2014-10-10 660 2014-10-12 150 12 66554 2014-10-10 660 2014-10-13 800
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.