简体   繁体   中英

Oracle - How to write this SQL?

In Oracle I have a table recording users' transactions like the one below. The user column isn't needed for the target query, just list here for reference.

user1, transaction1, $10            <-row1
user1, transaction2, $20            <-row2
user1, transaction3, $5             <-row3
user1, transaction4, $100           <-row4
user2, ... ...
user3, ... ...

For an given user, there will be a money cap, and I need to find out the minimal rows whose sum of money >= the given money cap, or all rows belonging to that user if the money cap is larger than the sum. The rows that are returned must be sorted by transaction in ascending order.

For example, for user1, the given money cap is $30. Then row1 and row2 must be returned. You cant return row4 as we must follow transaction order. If given cap is $13, row1 and row2 must be returned since row1 isn't enough to cover $13. If given cap is $136 then row1/2/3/4 are returned since $10+$20+$5+$100 is smaller than $136.

With cursor we can use a stored procedure to solve this, but I cant find an elegant way to use some nested queries with sum achieve this. Will really appreciate your help!

You can use analytic functions to do this fairly easily:

SELECT user_id, transaction_id, transaction_value
FROM   (SELECT user_id,
               transaction_id,
               transaction_value,
               SUM(transaction_value) 
                  OVER (PARTITION BY user_id 
                        ORDER BY transaction_id) AS running_total
        FROM   transactions)
WHERE  running_total <= :transaction_cap

Using SUM in this way provides the total of the current row plus all previous rows, according to the ORDER BY clause (in this case, the row's transaction and all transactions with lower IDs) where the column specified by the PARTITION BY clause is the same.


Taking a second look at the question, I realized that this would not work, since it would only return values less that the value you are looking for, rather than including the value that hits that point. The following revision returns the current row if the previous row is less than the target total.

SELECT user_id, transaction_id, transaction_value
FROM   (SELECT user_id,
               transaction_id,
               transaction_value,
               running_total,
               LAG(running_total) 
                   OVER (PARTITION BY user_id 
                         ORDER BY transaction_id) AS prior_total
        FROM   (SELECT user_id,
                       transaction_id,
                       transaction_value,
                       SUM(transaction_value) 
                          OVER (PARTITION BY user_id 
                                ORDER BY transaction_id) AS running_total
                FROM   transactions))
WHERE  prior_total < :transaction_cap or prior_total  is null

For a specific cap, same for all users:

SELECT user, transaction, amount
FROM MyTable t
WHERE ( SELECT SUM(ts.amount)
        FROM MyTable ts
        WHERE ts.user = t.user
          AND ts.transaction < t.transaction
      ) < @cap 
ORDER BY user, transaction

As requested, here's an R solution. I had to make a few assumptions to put this together, and here they are:

  1. The money cap information is stored in a separate table with an appropriate key to join on to the transaction data
  2. If a user's first transaction is greater than their money cap, then no rows are returned for that user

I commented the code below pretty heavily, but let me know if you have any questions. I first created some fake data which represents your data, then run the query you need at the very bottom.

You can look to interfacing your database with R through the RODBC package.

#load needed package
require(plyr)
#Sed seet for reproducibility
set.seed(123)

#Make some fake data
dat <- data.frame(user = rep(letters[1:4], each = 4)
                  , transaction = rep(1:4, 4)
                  , value = sample(5:50, 16,TRUE) 
                  )
#Separate "data.frame" or table with the money cap info
moneyCaps <- data.frame(user = letters[1:4], moneyCap = sample(50:100, 4, TRUE))

#Ensure that items are ordered by user and transcation #. 
dat <- dat[order(dat$user, dat$transaction) ,]

#Merge the transaction data with the moneyCap data. This is equivalant to an inner join
dat <- merge(dat, moneyCaps)

#After the merge, the data looks like this:


user transaction value moneyCap
1     a           1    18       62
2     a           2    41       62
3     a           3    23       62
4     a           4    45       62
5     b           1    48       52
6     b           2     7       52
....

#Use the plyr function ddply to split at the user level and return values which are <=
#to the moneyCap for that individual. Note that if the first transaction for a user
#is greater than the moneyCap, then nothing is returned. Not sure if that's a possibility
#with your data

ddply(dat, "user", function(x) subset(x, cumsum(value) <= moneyCap))

#And the results look like:

  user transaction value moneyCap
1    a           1    18       62
2    a           2    41       62
3    b           1    48       52
...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM