使用分区依据选择每个组的最大值

Question

I have the following code which is taking a looong time to get executed. 我有以下代码，需要花费很长时间才能执行。 What I need to do is select the column having row number equals 1 after partitioning it by three columns (col_1, col_2, col_3) [which are also the key columns] and ordering by some columns as mentioned below. 我需要做的是在将它分成三列（col_1，col_2，col_3）[也是关键列]并按下面提到的某些列排序后，选择行号等于1的列。 The number of records in the table is around 90 million. 表中的记录数约为9000万。 Am I following the best approach or is there any other better one? 我是按照最好的方法还是还有其他更好的方法吗？

  with cte as (SELECT
     b.*
    ,ROW_NUMBER() OVER ( PARTITION BY col_1,col_2,col_3
                         ORDER BY new_col DESC, new_col_2 DESC, new_col_3 DESC  ) AS ROW_NUMBER
  FROM (
    SELECT
       *
      ,CASE
         WHEN update_col = '        ' THEN new_update_col
         ELSE update_col
       END AS new_col_1
    FROM schema_name.table_name
    ) b
  )
 select top 10 * from cte WHERE ROW_NUMBER=1

Answer 1

Currently you are applying CASE on different columns which is impacting all rows in the database table. 目前，您在不同的列上应用CASE，这会影响数据库表中的所有行。 CASE (String Comparison) Is a costly method. CASE（字符串比较）是一种代价高昂的方法。

At the end, you are keeping only records with ROW NUMBER = 1. If I guess this filter keeping Half of your all records, this will increase the query execution time if you filter (Generate ROW NUMBER First and Keep Rows with RN=1) first and then apply CASE method on columns. 最后，你只保留ROW NUMBER = 1的记录。如果我猜这个过滤器保留了所有记录的一半，如果你过滤（生成ROW NUMBER优先并保持RN = 1的行），这将增加查询执行时间首先，然后在列上应用CASE方法。

使用分区依据选择每个组的最大值

问题描述

1 个解决方案

解决方案1
0 2019-05-06 08:11:37

使用分区依据选择每个组的最大值

问题描述

1 个解决方案

解决方案1 0 2019-05-06 08:11:37

解决方案1
0 2019-05-06 08:11:37