SQL Server 提高加入大小表的性能

Question

I am trying to join two tables:我正在尝试加入两个表：

Table1: (900 million rows (106 GB). And, id1, id2, id3, id4 are clustered primary key , houseType is string)表 1：（9 亿行（106 GB）。而且，id1、id2、id3、id4 是聚簇主键，houseType 是字符串）

+-----+-----+-----+------------+--------+
| Id1 | id2 | id3 | id4        |  val1  |
+-----+-----+-----+------------+--------+
| ac  |  15 | 697 | houseType1 | 75.396 |
+-----+-----+-----+------------+--------+
| ac  |  15 | 697 | houseType2 | 20.97  |
+-----+-----+-----+------------+--------+
| ac  |  15 | 805 | houseType1 | 112.99 |
+-----+-----+-----+------------+--------+
| ac  |  15 | 805 | houseType2 | 53.67  |
+-----+-----+-----+------------+--------+
| ac  |  27 | 697 | houseType1 | 67.28  |
+-----+-----+-----+------------+--------+
| ac  |  27 | 697 | houseType2 | 55.12  |
+-----+-----+-----+------------+--------+

Table 2 is very small with 150 rows.表 2 很小，只有 150 行。 And, val1, val2 are clustered primary key.并且，val1、val2 是聚簇主键。

+------+------+---------+
| val1 | val2 | factor1 |
+------+------+---------+
| 0    | 10   | 0.82    |
+------+------+---------+
| 10   | 20   | 0.77    |
+------+------+---------+
| 20   | 30   | 0.15    |
+------+------+---------+

What I need :我需要的：

For every "val1" in table1, it should be found which range [val1, val2] in table2 it belongs to and its associated "factor1" in table2 should be returned from table2, which will be used for further aggregate calculation.对于table1中的每个“val1”，应该找到它属于table2中的哪个范围[val1, val2]，并且应该从table2中返回其关联的table2中的“factor1”，用于进一步的聚合计算。

example of my query:我的查询示例：

 Select a.id1, a.id2, a.id3, a.id4, 
         max(case when a.val1 >= b.val1 and a.val1 < b.val2 then  b.factor1 * a.val1
                else null
            end ) as result
 From Table1 as a,
      Table2 as b
 Group by  a.id1, a.id2, a.id3, a.id4

For example, a row :例如，一行：

   ac ,  15, 697, houseType2, 20.97 in table1
   0.15 should be returned from table2 because 20.97 in range [20, 30] in table2.

There is no join action in the query because I do not know how to use join here.查询中没有连接操作，因为我不知道如何在此处使用连接。 I just need to lookup the factors for val1 in table2.我只需要在 table2 中查找 val1 的因子。

In SQL server, it runs very slow with more than 3 hours.在 SQL Server 中，运行速度非常慢，超过 3 小时。

I also got :我也得到了：

   Warning: Null value is eliminated by an aggregate or other SET operation.

Could anyone help me about this ?有人可以帮我解决这个问题吗？

thanks谢谢

Answer 1

This should reduce your recordset:这应该会减少您的记录集：

Select a.id1, a.id2, a.id3, a.id4, 
         b.factor1 * a.val1 as result
 From Table1 a inner join
      Table2 b on a.val1 >= b.val1 and a.val1 < b.val2

This way, you will only get a single record from b for each record from a.这样，对于来自 a 的每条记录，您只会从 b 获得一条记录。 This is at least a start to improve your performance problem.这至少是改善性能问题的开始。

No need for MAX because you are joining to get a single record.不需要 MAX，因为您加入是为了获得单个记录。

Answer 2

I would be inclined to express this as a subquery or lateral join:我倾向于将其表示为子查询或横向连接：

Select a.id1, a.id2, a.id3, a.id4, b.factor1 * a.val1 as result
From Table1 a cross apply
     (select b.*
      from Table2 b
      where a.val1 >= b.val1 and a.val1 < b.val2 
     ) b;

The aggregation is unnecessary because the four keys constitute the primary key.聚合是不必要的，因为四个键构成了主键。

SQL Server 提高加入大小表的性能

问题描述

2 个解决方案

解决方案1
1 2018-02-13 02:15:48

解决方案2
0 2018-02-13 03:47:01

SQL Server 提高加入大小表的性能

问题描述

2 个解决方案

解决方案1 1 2018-02-13 02:15:48

解决方案2 0 2018-02-13 03:47:01

解决方案1
1 2018-02-13 02:15:48

解决方案2
0 2018-02-13 03:47:01