简体   繁体   English

SQL Server 提高加入大小表的性能

[英]SQL server improve performance of joining a large and small table

I am trying to join two tables:我正在尝试加入两个表:

Table1: (900 million rows (106 GB). And, id1, id2, id3, id4 are clustered primary key , houseType is string)表 1:(9 亿行(106 GB)。而且,id1、id2、id3、id4 是聚簇主键,houseType 是字符串)

+-----+-----+-----+------------+--------+
| Id1 | id2 | id3 | id4        |  val1  |
+-----+-----+-----+------------+--------+
| ac  |  15 | 697 | houseType1 | 75.396 |
+-----+-----+-----+------------+--------+
| ac  |  15 | 697 | houseType2 | 20.97  |
+-----+-----+-----+------------+--------+
| ac  |  15 | 805 | houseType1 | 112.99 |
+-----+-----+-----+------------+--------+
| ac  |  15 | 805 | houseType2 | 53.67  |
+-----+-----+-----+------------+--------+
| ac  |  27 | 697 | houseType1 | 67.28  |
+-----+-----+-----+------------+--------+
| ac  |  27 | 697 | houseType2 | 55.12  |
+-----+-----+-----+------------+--------+

Table 2 is very small with 150 rows.表 2 很小,只有 150 行。 And, val1, val2 are clustered primary key.并且,val1、val2 是聚簇主键。

+------+------+---------+
| val1 | val2 | factor1 |
+------+------+---------+
| 0    | 10   | 0.82    |
+------+------+---------+
| 10   | 20   | 0.77    |
+------+------+---------+
| 20   | 30   | 0.15    |
+------+------+---------+

What I need :我需要的 :

For every "val1" in table1, it should be found which range [val1, val2] in table2 it belongs to and its associated "factor1" in table2 should be returned from table2, which will be used for further aggregate calculation.对于table1中的每个“val1”,应该找到它属于table2中的哪个范围[val1, val2],并且应该从table2中返回其关联的table2中的“factor1”,用于进一步的聚合计算。

example of my query:我的查询示例:

 Select a.id1, a.id2, a.id3, a.id4, 
         max(case when a.val1 >= b.val1 and a.val1 < b.val2 then  b.factor1 * a.val1
                else null
            end ) as result
 From Table1 as a,
      Table2 as b
 Group by  a.id1, a.id2, a.id3, a.id4

For example, a row :例如,一行:

   ac ,  15, 697, houseType2, 20.97 in table1
   0.15 should be returned from table2 because 20.97 in range [20, 30] in table2.

There is no join action in the query because I do not know how to use join here.查询中没有连接操作,因为我不知道如何在此处使用连接。 I just need to lookup the factors for val1 in table2.我只需要在 table2 中查找 val1 的因子。

In SQL server, it runs very slow with more than 3 hours.在 SQL Server 中,运行速度非常慢,超过 3 小时。

I also got :我也得到了:

   Warning: Null value is eliminated by an aggregate or other SET operation. 

Could anyone help me about this ?有人可以帮我解决这个问题吗?

thanks谢谢

This should reduce your recordset:这应该会减少您的记录集:

Select a.id1, a.id2, a.id3, a.id4, 
         b.factor1 * a.val1 as result
 From Table1 a inner join
      Table2 b on a.val1 >= b.val1 and a.val1 < b.val2

This way, you will only get a single record from b for each record from a.这样,对于来自 a 的每条记录,您只会从 b 获得一条记录。 This is at least a start to improve your performance problem.这至少是改善性能问题的开始。

No need for MAX because you are joining to get a single record.不需要 MAX,因为您加入是为了获得单个记录。

I would be inclined to express this as a subquery or lateral join:我倾向于将其表示为子查询或横向连接:

Select a.id1, a.id2, a.id3, a.id4, b.factor1 * a.val1 as result
From Table1 a cross apply
     (select b.*
      from Table2 b
      where a.val1 >= b.val1 and a.val1 < b.val2 
     ) b;

The aggregation is unnecessary because the four keys constitute the primary key.聚合是不必要的,因为四个键构成了主键。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM