简体   繁体   中英

Filter rows where 2 column values appear more than once

I have a table like:

SELECT
s.date,
s.orderid,
s.num1,
s.num2,
s.sales,
s.price
FROM sales AS s

Resulting in

date       | orderid | num1 | num 1 | sales | price
2020-11-01 | 1       | a    | aa    | 1     | 10
2020-11-01 | 8       | k    | kk    | 1     | 10
2020-11-02 | 1       | a    | aa    | -1    | 10
2020-11-01 | 2       | b    | bb    | 2     | 8
2020-11-01 | 3       | c    | cc    | 1     | 10
2020-11-01 | 3       | c    | cc    | 2     | 9
2020-11-04 | 18      | u    | uu    | 5     | 2

"orderid" and "num1" should only appear once, otherwise it's a return (second entry has "sales" of -1, negating the earlier sales. So, I need to remove those entries completely (not keeping a row). Otherwise, "orderid" has no meaning and is not needed.

I want to group by "date", "num1" and "num2", summing up all sales and getting the average price while removing orderids+num1 that appear more than once together.

End result should be:

date       | orderid | num1 | num 1 | sales | price
2020-11-01 | 8       | k    | kk    | 1     | 10
2020-11-01 | 2       | b    | bb    | 2     | 8
2020-11-01 | 3       | c    | cc    | 3     | 9.5
2020-11-04 | 18      | u    | uu    | 5     | 2

How can I do this with a Groupby? So far I have this:

SELECT
s.date,
s.num1,
s.num2,
SUM(s.sales),
AVG(s.price)
FROM sales AS s
GROUP BY s.date, s.num1, s.num2

You can use window functions. Based on your description (removing orders that appear more than once), you can use count(*) :

select s.date, s.num1, s.num2, SUM(s.sales), AVG(s.price)
from (select s.*, count(*) over (partition by orderid, num1) as cnt
      from sales s
     ) s
where cnt = 1
group by s.date, s.num1, s.num2;

I suspect you really want row_number() , so you keep one of the duplicate rows.

You can use group by and having as follows:

SELECT max(s.date) as date,
       S.orderid,
       s.num1,
       s.num2,
       SUM(s.sales),
       AVG(s.price)
  FROM sales AS s
GROUP BY s.orderid, s.num1, s.num2
Having sum(sales) > 0;

Question:

Is this a transaction log where you have orderId 1 with sales entries 10, 5, -1, 7, 8 which should result in a value of 15? The 10 and 5 are negated by the -1. If so, you need to do a query which a) Finds all rows after the last -1 for that orderId and sum up the sales values.

Case for this is sales values for the same orderId of 5, 6, -1, 7, 9, -1, 10, 2 which should only use 10 and 2 for the final amount

Something like

Query 1 - find the max(date) value for each order id where sales amount is -1 Query 2 - Use query 1 to get all transactions for each orderId where date > the date in query 1

WITH (define query 1 here)
SELECT s.OrderId, Sum(s.sales) as TotalSales, Avg(s.Price) as AveragePrice
FROM sales s
LEFT OUTER JOIN (query 1) q1 ON q1.OrderId
WHERE (q1.Date is null) OR (s.Date > q1.Date)
GROUP BY s.OrderId

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM