Lets say I have a data.table
which looks like this
customer TranAmount
1: 146506 1290.49
2: 146506 2699.00
3: 146506 2720.00
4: 146506 2700.00
5: 146506 6.35
6: 146506 2700.00
7: 146506 2705.00
8: 146506 2691.00
9: 146506 500.00
10: 146506 500.95
11: 146506 52.00
Now I want to calculate support
of each amount, by support
I mean if I pick a transaction and there are transactions which is within the threshold of that transaction, then support
of that transaction is equal to the number of transaction within that limit.
For example in the above data, for TranAmount
2700.00, if we consider a threshold of 1% (above 2700.00 or below 2700.00) then there are 6 transactions within that range, so support
for TranAmount 2700.00
is 6
I have written this function which does that but its slow and certainly not in data.table
way but it does the job, I am sure there are better ways to achieve this, but I cant think of any.
get_support <- function(dt,val_tolerance=0.01) {
support_dt <- dt[,.(customer,TranAmount)][order(TranAmount)]
support_dt[,support:= 0]
for(i in 1:nrow(support_dt)) {
start <- support_dt[i,TranAmount]
current_support <- support_dt[i,support]
amount_limit <- c((start - start*val_tolerance),(start + start*val_tolerance))
for (j in 1:nrow(support_dt)){
amount <- support_dt[j,TranAmount]
if(between(amount,amount_limit[1],amount_limit[2]) ==TRUE ){
current_support <- current_support+1
}else{
current_support <- current_support
}
}
#print(current_support)
support_dt[i,support:=current_support]
}
print(support_dt)
}
Please suggest better way to achieve the same.
With version 1.9.7 or later, use non-equi joins:
vals = c(2700, 500)
DT[.(dn = vals*0.99, up = vals*1.01), on=.(TranAmount >= dn, TranAmount <= up),
.N
, by=.EACHI]
# TranAmount TranAmount N
# 1: 2673 2727 6
# 2: 495 505 2
The column names in the result aren't very intuitive, but those might change.
Currently (Aug 2016), you'll need to install the devel version for this.
I got warnings but apparently the logic was sound:
dt[ , support := ave(TranAmount, TranAmount,
FUN= function(x) sum(abs(x -dt$TranAmount) < 0.01*x) ) ]
#---------------------------------
Warning messages:
1: In x - dt$TranAmount :
longer object length is not a multiple of shorter object length
2: In abs(x - dt$TranAmount) < 0.01 * x :
longer object length is not a multiple of shorter object length
> dt
customer TranAmount support
1: 146506 1290.49 1
2: 146506 2699.00 6
3: 146506 2720.00 5
4: 146506 2700.00 6
5: 146506 6.35 1
6: 146506 2700.00 6
7: 146506 2705.00 6
8: 146506 2691.00 5
9: 146506 500.00 2
10: 146506 500.95 2
11: 146506 52.00 1
Here is a hacky data.table
solution that works (there is likely a cleaner way though)
tvals <- df$TranAmount
pct <- 0.01
dfDT[, id := .I][,support := sum( TranAmount*(1-pct) <= tvals & tvals <= TranAmount*(1+pct) ), by = list(id)][,id:=NULL]
EDIT : an alternate data.table
approach
dfDT[, support := sum( TranAmount*(1-pct) <= tvals & tvals <= TranAmount*(1+pct) ), by = rownames(dfDT)]
and here is a dplyr
solution
df %>%
group_by(rn=row_number()) %>%
mutate(support = sum( TranAmount*(1-pct) <= tvals & tvals <= TranAmount*(1+pct) ) ) %>%
ungroup %>%
select(-rn)
## customer TranAmount support
## <int> <dbl> <int>
## 1 146506 1290.49 1
## 2 146506 2699.00 6
## 3 146506 2720.00 5
## 4 146506 2700.00 6
## 5 146506 6.35 1
## 6 146506 2700.00 6
## 7 146506 2705.00 6
## 8 146506 2691.00 5
## 9 146506 500.00 2
## 10 146506 500.95 2
## 11 146506 52.00 1
Note, df
( data.frame
) and dfDT
( data.table
) contain the same data.
Here's a relatively simple data.table way
x[, support := x[abs(TranAmount- .SD[,TranAmount]) < 0.01*.SD[,TranAmount] , .N],
by=1:NROW(x)]
# customer TranAmount support
# 1: 146506 1290.49 1
# 2: 146506 2699.00 6
# 3: 146506 2720.00 5
# 4: 146506 2700.00 6
# 5: 146506 6.35 1
# 6: 146506 2700.00 6
# 7: 146506 2705.00 6
# 8: 146506 2691.00 5
# 9: 146506 500.00 2
#10: 146506 500.95 2
#11: 146506 52.00 1
The data:
x = data.table(customer=rep(146506,11),
TranAmount=c(1290.49, 2699.00, 2720.00, 2700.00, 6.35, 2700.00,
2705.00, 2691.00, 500.00, 500.95, 52.00))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.