简体   繁体   中英

Select row with least value in multiple columns without ROW_NUMBER

I want to get the row per group with the min value of two columns.

I have a table that has listings for items I want, as well as their cost and distance from me.

mytable:
item | cost | dist
-----+------+---------
1    | $2   | 1.0
1    | $3   | 0.5
1    | $4   | 2.0
2    | $2   | 2.0
2    | $2   | 1.5
2    | $2   | 4.0
2    | $8   | 1.0
2    | $12  | 3.0
3    | $1   | 5.0

For each item, I want to get the row that has the min cost, then if there are multiple of the min cost, get the one with the min dist

so my result would be

item | cost | dist
-----+------+---------
1    | $2   | 1.0
2    | $2   | 1.5
3    | $1   | 5.0

I know I can achieve this result using

SELECT * 
, ROW_NUMBER() OVER(PARTITION BY item ORDER BY cost ASC, dist ASC) as [RID]
FROM mytable
WHERE [RID] = 1

but the problem comes when I have 100,000 items each with 100,000 listings, and sorting the whole table becomes incredibly time-consuming.

Since I only need the top 1 of each group, I'm wondering if there is another way to get the result I want without sorting the whole table of 10,000,000,000 entries.

Currently using SQL Server 2012

A nice article on this topic is by Itzik Ben Gan - Optimizing TOP N Per Group Queries . This discusses a concatenation approach.

For example if your table is

CREATE TABLE #YourTable
  (
     item INT,
     cost MONEY CHECK (cost >= 0),
     dist DECIMAL(10, 2) CHECK (dist >= 0)
  ) 

you might use

WITH T AS
(
SELECT item,  
       MIN(FORMAT(CAST(cost * 100 AS INT), 'D10') + FORMAT(CAST(dist * 100 AS INT), 'D10')) AS MinConcat
FROM #YourTable
GROUP BY item
)
SELECT item,
       CAST(LEFT(MinConcat,10)/100.0 AS MONEY),
       CAST(RIGHT(MinConcat,10)/100.0 AS  DECIMAL(10,2))
FROM T

So this can be done in a single grouping operation on id (which could be a hash aggregate without any sort).

You need to be careful that the value of the concatenated result has the same ordering when treated as a string as cost, dist would have when treated as raw column values so the query above may need adjusting if your datatypes are different.

It currently reserves the left most 10 characters for cost represented as an integer number of pence and padded with leading zerores, and dist as a 10 digit integer similarly.

You can do this way

; with c as 
(select min(cost) as cost, item
from mytable
group by item)
select t.* from mytable t
inner join c
on c.item = t.item and c.cost=t.cost;

However, I'd recommend you to add index to item and cost columns to make the query fast.

[Edit] After re-reading the OP question, it should be like the following when there are ties in cost,

; with c as 
(select min(cost) as cost, item
from mytable
group by item)
, c2 as (
select t.cost, t.item, min(dist) as dist from mytable t
inner join c
on c.item = t.item and c.cost=t.cost
group by t.cost, t.item)
select  t.item,t.cost, c2.dist from mytable t
inner join c2
on c2.item = t.item, and c2.cost = t.cost;

Maybe there are better ways, but this should work.

If you have a table of items, then this might work:

select i.*, t.*
from items i cross apply
     (select top (1) t.*
      from t
      where t.item = i.item
      order by cost, dist
     ) t;

For this to be efficient, you need an index on (item, cost, dist) .

Something like this should work:

select
    t.item, MIN(t.cost) as mincost, min(t2.mindist) as mindist
from mytable t
inner join (
select item, cost, MIN(dist) as mindist
    from mytable
    group by
        item, cost
) t2 on t.item = t2.item
group by t.item,t2.cost
having MIN(t.cost) = t2.cost

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM