简体   繁体   English

如何选择具有最大值的列和分组依据的行

[英]How to select a row having a column with max value with a group by

I have a table with the next columns 我有一张桌子,后面有几列

MSG_ID          NOT NULL NUMBER(10)     
CREATION_DATE            DATE           
PORT                     VARCHAR2(50)   
MESSAGE                  VARCHAR2(1024) 
IP_ADDRESS               VARCHAR2(50)   
PARSED                   NUMBER(1)      
PARSED_ON                DATE    

Where parse time is parsed_on - creation_date. 解析时间为parsed_on-creation_date。

I would like to know if it is possible in 1 single query extract for each hour the message that take longer to parse, getting the HOUR, PORT, MSG_ID MINUTES...I am blocked here 我想知道是否有可能每小时对消息进行一次较长时间的解析,从而获取HOUR,PORT,MSG_ID MINUTES ...

select TO_CHAR(CREATION_DATE, 'HH24') || ':mm' HOUR, PORT, MSG_ID, ROUND(MAX(parsed_on -  creation_date)) * 24*60 MINUTES
        from T_INCOME_CALLS 
         where TO_CHAR(CREATION_DATE, 'dd/mm/yyyy') = TO_CHAR(SYSDATE, 'dd/mm/yyyy') 
        group by TO_CHAR(CREATION_DATE, 'HH24'), PORT, MSG_ID
         order by TO_CHAR(CREATION_DATE, 'HH24') ;

You can use window function row_number to find row with largest parse time in each hour like this: 您可以使用窗口函数row_number查找每小时解析时间最多的行,如下所示:

select *
from (
    select to_number(to_char(creation_date, 'HH24')) as hour,
        port,
        msg_id,
        round(parsed_on - creation_date) * 24 * 60 as parse_time,
        row_number() over (
            partition by to_char(creation_date, 'HH24'), port, msg_id
            order by (parsed_on - creation_date) desc nulls last
            ) as rn
    from t_income_calls t
    where creation_date between trunc(sysdate) 
                            and trunc(sysdate + 1) - interval '1' second
    ) t
where rn = 1;

Also, notice the filter. 另外,请注意过滤器。 I used date range instead of to_char on creation_date. 我在creation_date上使用了日期范围而不是to_char。 The use of to_char on creation_date inhibits the use of index on creation_date if it is present. 在creation_date上使用to_char会禁止在creation_date上使用索引(如果存在)。

I have assumed that the need is for the item that takes most time, per hour, for a grouping of IP_ADDRESS and PORT, which is different to your original query. 我假设需要将每小时花费最多时间的项目用于IP_ADDRESS和PORT的分组,这与原始查询不同。 I am also assuming MSG_ID is unique. 我还假设MSG_ID是唯一的。

If you want 1 and only 1 row per recorded hour then use row_number() , if however you want tied values as well substitute dense_rank() in the query below. 如果您希望每个记录小时仅记录1行,则请使用row_number() ,但是,如果您希望使用绑定值,请在下面的查询中替换dense_rank() The create_on date has been used as a tie-beaker for sorting. create_on日期已用作排序的平口杯。

SELECT
       TO_CHAR(CREATION_DATE, 'HH24') || ':mm' HOUR
     , PORT, MSG_ID
     , ROUND(parsed_on -  creation_date) * 24*60 MINUTES
FROM (
      SELECT
            T_INCOME_CALLS.*
           , ROW_NUMBER() OVER(PARTITION BY IP_ADDRESS, port, TO_CHAR(CREATION_DATE, 'HH24') 
                                ORDER BY (parsed_on - creation_date) desc, CREATION_DATE) AS rn
      FROM T_INCOME_CALLS
      WHERE CREATION_DATE >= TRUNC(SYSDATE) AND CREATION_DATE < TRUNC(SYSDATE) + 1
      ) 
WHERE rn = 1

Please avoid converting dates into strings for your where clause, this is not efficient . 请避免将where子句的日期转换为字符串,这效率不高。 Instead leave created_on untouched and amend the criteria to suit that data which will allow access to indexes for the filtering. 取而代之的是保持created_on不变,并修改标准以适合该数据,以允许访问索引以进行过滤。

You can get it also without a sub-query when you use FIRST function: 使用FIRST函数时,也可以在没有子查询的情况下获取它:

SELECT TO_CHAR(CREATION_DATE, 'HH24') || ':mm' HOUR, PORT, MSG_ID, 
    MAX(MESSAGE) KEEP (DENSE_RANK FIRST ORDER BY (parsed_on - creation_date) desc, CREATION_DATE)                
FROM T_INCOME_CALLS 
WHERE CREATION_DATE >= TRUNC(SYSDATE) AND CREATION_DATE < TRUNC(SYSDATE) + 1
GROUP BY TO_CHAR(CREATION_DATE, 'HH24'), PORT, MSG_ID
ORDER BY TO_CHAR(CREATION_DATE, 'HH24');

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM