[英]SQL server improve performance of joining a large and small table
I am trying to join two tables:我正在尝试加入两个表:
Table1: (900 million rows (106 GB). And, id1, id2, id3, id4 are clustered primary key , houseType is string)表 1:(9 亿行(106 GB)。而且,id1、id2、id3、id4 是聚簇主键,houseType 是字符串)
+-----+-----+-----+------------+--------+
| Id1 | id2 | id3 | id4 | val1 |
+-----+-----+-----+------------+--------+
| ac | 15 | 697 | houseType1 | 75.396 |
+-----+-----+-----+------------+--------+
| ac | 15 | 697 | houseType2 | 20.97 |
+-----+-----+-----+------------+--------+
| ac | 15 | 805 | houseType1 | 112.99 |
+-----+-----+-----+------------+--------+
| ac | 15 | 805 | houseType2 | 53.67 |
+-----+-----+-----+------------+--------+
| ac | 27 | 697 | houseType1 | 67.28 |
+-----+-----+-----+------------+--------+
| ac | 27 | 697 | houseType2 | 55.12 |
+-----+-----+-----+------------+--------+
Table 2 is very small with 150 rows.表 2 很小,只有 150 行。 And, val1, val2 are clustered primary key.
并且,val1、val2 是聚簇主键。
+------+------+---------+
| val1 | val2 | factor1 |
+------+------+---------+
| 0 | 10 | 0.82 |
+------+------+---------+
| 10 | 20 | 0.77 |
+------+------+---------+
| 20 | 30 | 0.15 |
+------+------+---------+
What I need :我需要的 :
For every "val1" in table1, it should be found which range [val1, val2] in table2 it belongs to and its associated "factor1" in table2 should be returned from table2, which will be used for further aggregate calculation.对于table1中的每个“val1”,应该找到它属于table2中的哪个范围[val1, val2],并且应该从table2中返回其关联的table2中的“factor1”,用于进一步的聚合计算。
example of my query:我的查询示例:
Select a.id1, a.id2, a.id3, a.id4,
max(case when a.val1 >= b.val1 and a.val1 < b.val2 then b.factor1 * a.val1
else null
end ) as result
From Table1 as a,
Table2 as b
Group by a.id1, a.id2, a.id3, a.id4
For example, a row :例如,一行:
ac , 15, 697, houseType2, 20.97 in table1
0.15 should be returned from table2 because 20.97 in range [20, 30] in table2.
There is no join action in the query because I do not know how to use join here.查询中没有连接操作,因为我不知道如何在此处使用连接。 I just need to lookup the factors for val1 in table2.
我只需要在 table2 中查找 val1 的因子。
In SQL server, it runs very slow with more than 3 hours.在 SQL Server 中,运行速度非常慢,超过 3 小时。
I also got :我也得到了:
Warning: Null value is eliminated by an aggregate or other SET operation.
Could anyone help me about this ?有人可以帮我解决这个问题吗?
thanks谢谢
This should reduce your recordset:这应该会减少您的记录集:
Select a.id1, a.id2, a.id3, a.id4,
b.factor1 * a.val1 as result
From Table1 a inner join
Table2 b on a.val1 >= b.val1 and a.val1 < b.val2
This way, you will only get a single record from b for each record from a.这样,对于来自 a 的每条记录,您只会从 b 获得一条记录。 This is at least a start to improve your performance problem.
这至少是改善性能问题的开始。
No need for MAX because you are joining to get a single record.不需要 MAX,因为您加入是为了获得单个记录。
I would be inclined to express this as a subquery or lateral join:我倾向于将其表示为子查询或横向连接:
Select a.id1, a.id2, a.id3, a.id4, b.factor1 * a.val1 as result
From Table1 a cross apply
(select b.*
from Table2 b
where a.val1 >= b.val1 and a.val1 < b.val2
) b;
The aggregation is unnecessary because the four keys constitute the primary key.聚合是不必要的,因为四个键构成了主键。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.