mysql SUM amount IF reference is distinct GROUP By category

Question

I am trying to SUM amounts by a category, but there are duplicate amounts based on a reference number and I only want to include 1 amount per reference. There are about 100K different reference numbers, with 4 difference amount across the board.

The data I am analyzing look like this:

reference | category | amount | status 
5574682   | cat1     | 45     | active 
5574682   | cat1     | 45     | inactive 
5574684   | cat1     | 95     | active 
5574869   | cat2     | 65     | active 
5574869   | cat2     | 65     | inactive 
5574870   | cat2     | 55     | active 
5574870   | cat2     | 55     | inactive 
5574891   | cat3     | 95     | active 
5574892   | cat3     | 45     | active 
5574892   | cat3     | 45     | inactive

The below shows the correct result as a selection, but not the summed total by category

SELECT
    a.reference,
    c.category,
    a.amount
FROM
    table1_ref a
    JOIN (
        SELECT *
        FROM
            table_ref a
            JOIN table_requests b ON a.transactionid = b.requestid
            JOIN table_users c ON a.user_code = c.user_code
        WHERE b.filename IN ('20190614','20190625','20190628')
    ) b ON a.reference = b.reference
    JOIN table_users c ON a.user_code = c.user_code
WHERE
    a.date BETWEEN '2019-08-01' AND '2019-08-31'
    AND c.category IN (cat1, cat2, cat3)
GROUP BY
    a.reference,
    c.category;

With the above code I get results looking like this:

reference | category | amount
5574682   | cat1     | 45
5574684   | cat1     | 95
5574869   | cat2      | 65
5574870   | cat2      | 55
5574891   | cat3      | 95
5574892   | cat3      | 45

My expected result is as per below

cat1 | 140
cat2 | 120
cat3 | 140

Answer 1

UPDATED:

If you need to get results like this:

reference | category | amount | status 
----------|----------|--------|---------
5574682   | cat1     | 45     | active 
5574682   | cat1     | 45     | inactive 
5574684   | cat1     | 95     | active
5574869   | cat2     | 65     | inactive -- Lines below
5574869   | cat2     | 65     | inactive -- would be impossible to get
5574870   | cat2     | 55     | inactive -- with GROUP BY, because
5574870   | cat2     | 55     | inactive -- `reference`, `category` and `status`
5574891   | cat3     | 95     | inactive -- are the same among pairs
5574892   | cat3     | 45     | inactive -- so they would be represented as one row
5574892   | cat3     | 45     | inactive -- with total amount

Then you have to use an aggregate SUM() function and list an additional column in the outer column like this:

SELECT
    a.reference,
    c.category,
    SUM(a.amount) as amount,    -- CHANGED
    SOMETABLE.status            -- ADDED
FROM
    table1_ref a
    JOIN (
        SELECT *
        FROM
            table_ref a
            JOIN table_requests b ON a.transactionid = b.requestid
            JOIN table_users c ON a.user_code = c.user_code
        WHERE b.filename IN ('20190614','20190625','20190628')
    ) b ON a.reference = b.reference
    JOIN table_users c ON a.user_code = c.user_code
WHERE
    a.date BETWEEN '2019-08-01' AND '2019-08-31'
    AND c.category IN (cat1, cat2, cat3)
GROUP BY
    a.reference,
    c.category,
    SOMETABLE.status;             -- ADDED

Answer 2

Since there are duplicates for each reference, you could use MAX aggregate function to get only 1 value per reference:

SELECT cat, SUM(amount) FROM 
    (SELECT MAX(`amount`) AS amount, `reference` AS ref, `category` AS cat 
      FROM data GROUP BY `reference`) AS T
GROUP BY cat

This works by:

1st grouping by reference number and getting the MAX value, in order to just get only 1 amount
2nd grouping by category on the result set obtained

If same reference numbers are shared between different categories, then change the GROUP BY clause to:

FROM data GROUP BY `reference`, `category`

mysql SUM amount IF reference is distinct GROUP By category

Question

2 answers

solution1
0 2019-11-07 20:35:41

solution2
0 2019-11-08 19:31:37

mysql SUM amount IF reference is distinct GROUP By category

Question

2 answers

solution1 0 2019-11-07 20:35:41

solution2 0 2019-11-08 19:31:37

solution1
0 2019-11-07 20:35:41

solution2
0 2019-11-08 19:31:37