简体   繁体   English

根据一列的MAX和另一列的相关MIN选择行

[英]Select row based upon the MAX of one column and the related MIN of another

I have a table set up like this: 我有一个这样的表:

CREATE TABLE dbo.IntervalCounts (
    item_id int NOT NULL,
    interval_time time(0) NOT NULL,
    interval_count int DEFAULT 0 NOT NULL
)

Each item_id has 96 interval_time s, from 00:00 to 23:45 in 15 minute increments. 每个item_id具有96个interval_time ,从00:00到23:45以15分钟为增量。 each interval_time has an interval_count >= 0. This table has approx. 每个interval_timeinterval_count > = 0。 200 million rows. 2亿行。

I need to select values from a table where the count is the highest, then, if there are multiple qualifying rows with the same count, pick the one with the lowest interval time. 我需要从计数最高的表中选择值,然后,如果有多个具有相同计数的合格行,则选择间隔时间最短的行。

So, if I have a item_id 1, whose max count is 100: 因此,如果我有一个item_id 1,其最大计数为100:

item_id   interval_time interval_count
1         00:00         100
1         13:15         100
1         07:45         100
1         19:30         100

I'd like to get just one row: 我只想排一行:

item_id   interval_time interval_count
1         00:00         100

Getting the first selection is easy enough, I've got: 获得第一选择很容易,我得到了:

SELECT a.item_id, a.interval_time, a.interval_count
    FROM dbo.IntervalCounts a
    LEFT JOIN dbo.IntervalCounts b
        ON a.item_id = b.item_id
        AND a.interval_count < b.interval_count
    WHERE 1=1
    AND b.interval_count IS NULL

However, getting it down to just one row has proven tricky for me. 但是,对我来说,将其降低到仅一排是很棘手的。

This triple self-join ran for an hour and a half before I killed it (I'll be running it regularly, ideally it would run no more than 15 minutes max). 在我杀死它之前,这种三重自我连接运行了一个半小时(我将定期运行它,理想情况下,它最多运行不超过15分钟)。

SELECT a.item_id, a.interval_time, a.interval_count
    FROM dbo.IntervalCounts a
    LEFT JOIN dbo.IntervalCounts b
        ON a.item_id = b.item_id
        AND a.interval_count < b.interval_count
    LEFT JOIN dbo.IntervalCounts c
        ON a.item_id = c.item_id
        -- if I remove this line, it will ALWAYS give me the 00:00 interval
        -- if I keep it, it runs way too long
        AND a.interval_count = c.interval_count
        AND a.interval_time > c.interval_time
    WHERE 1=1
    AND b.interval_count IS NULL
    AND c.interval_time IS NULL

Doing something like this just seems ungainly, and I was also forced to kill the execution after about an hour and a half: 像这样做起来似乎很费劲,而且我也被迫在大约一个半小时后终止执行:

DECLARE @tempTable TABLE
    (
    item_id int,
    interval_time time(0),
    interval_count int
    )

INSERT INTO @tempTable
SELECT a.item_id, a.interval_time, a.interval_count
FROM dbo.IntervalCount a
LEFT JOIN dbo.IntervalCount b
    ON a.item_id = b.item_id
    AND a.interval_count < b.interval_count
WHERE 1=1
AND b.interval_count IS NULL

SELECT a.item_id, a.interval_time, a.interval_count
FROM @tempTable a
LEFT JOIN @tempTable b
    ON a.item_id = b.item_id
    AND a.interval_time > b.interval_time
WHERE 1=1
AND b.interval_time IS NULL

There must be a better way, but I'm stumped. 一定有更好的方法,但是我很沮丧。 How can I do this in a manner that won't take forever to run? 如何以一种不会永远运行的方式来做到这一点?

You are overthinking it, you can use ROW_NUMBER : 您想得太多了,可以使用ROW_NUMBER

WITH CTE AS
(
    SELECT  *,
            RN = ROW_NUMBER() OVER(PARTITION BY item_id 
                                   ORDER BY interval_count DESC, interval_time)
    FROM dbo.IntervalCounts
)
SELECT *
FROM CTE
WHERE RN = 1;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 选择一列的最小值,另一列的最大值以及与max对应的字​​段 - Select min of one column, max of another column and fields that go with max Select MIN, MAX 对应列基于另一列值 - Select MIN, MAX Corresponding column based on another column values Select 根据另一列值查询一列中的最小值和最大值 - Select Query Min and Max in one column based on other column value SQL根据来自另一个相关表中的列的相应最小值/最大值(值)获取值 - SQL get value based on corresponding min/max(value) from column in another related table 一列中的最大值,另一列中的最小值 - max in one column and min in another column 如何根据相关行中的最小值/最大值更新行 - How to update a row based on min/max values in related rows 根据另一列中的唯一日期选择一列中的最大日期 - Select max date in one column based on unique dates in another Select 当其他列相同时,一列最小,另一列最大连续数字 - Select min of one column and max another column of consecutive numbers when other columns are the same 如何根据另一列的唯一ID获取一列的最大值,平均值,最小值,计数 - How to get max value, average, min, count from one column based on a unique id from another column 选择至少一列的行 - Select a row where one column is min
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM