简体   繁体   中英

Complex SQL Calculation

I have a bunch of Venmo data, and I would like to calculate certain metrics in SQL before working with it in Stata, primarily due to size constraints. The data consists of the following columns: "Sender", "Receiver", "Amount", and "Date". In this case, the "Sender" and "Receiver" can be the same person. I would like to write an SQL statement which calculates the sum of squares of amount sent to all individuals except the sender over the total amount sent to all parties (including themselves). Mathematically:

sum (i=/=j) (Amount{i,j,d}/Amount{i,d})^2

Where Amount{i,j,d} denotes the amount sent from one person to another, where i and j are not the same person. Basically, find the total anyone sends to anyone else, square it over the total that's sent by that person on a given day, and add all those terms up together. I tried the following query (which does not seek to do this calculation, only to get the data that I could then use to do so):

SELECT Amount, Sender, Receiver, Date,
SUM(Amount) as Total_Amount
FROM Table
GROUP BY Sender Receiver Date
ORDER BY Sender Receiver Date

However, this pull yielded way too many data points for my paltry computer to handle in Stata. I then tried the following:

SELECT Amount, Sender, Receiver, Date,
SUM(Amount) as Total_Amount,
SUM(POWER(SUM(Amount)/(Total_Amount), 2)) as Sum_Squares
FROM Table
GROUP BY Sender Receiver Date
ORDER BY Sender Receiver Date

I received an error telling me that it doesn't recognize Total_Amount as a column, which is roughly what I expected to happen. So now I'm stuck, as I don't know how to construct this Total_Amount variable before using it in another calculation that I'd like to run. Any advice on how to directly calculate the term I described above would be sincerely appreciated. Please comment with any clarifications. Thank you.

To sum and then sum based on the results of the first sum, you use a subquery, and use a window function (requires mysql 8 or mariadb 10.2+) to get the sender's total for all recipients in that subquery.

SELECT Date, Sender, SUM(POWER(Sender_Receiver_Total/Sender_Total, 2)) Sum_Squares
FROM (
    SELECT Date, Sender, Receiver,
        SUM(Amount) Sender_Receiver_Total,
        SUM(SUM(Amount)) OVER (PARTITION BY Date, Sender) Sender_Total
    FROM `Table`
    GROUP BY Date, Sender, Receiver
) Date_Sender_Receiver
GROUP BY Date, Sender

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM