[英]SQL Server query for SSIS transformation timing out due to 174 UNION ALL statements
I have a table in Hive and SQL Server with data stored as below.我在 Hive 和 SQL 服务器中有一个表,其数据存储如下。 I am using SSIS to move this data in to SQL Server.
我正在使用 SSIS 将此数据移动到 SQL 服务器。 The query is taking too long.
查询时间过长。 There are about 175 separate values in the Description column, which results in 174 UNION ALL statements due to which the query times out after about 2 hours.
Description 列中大约有 175 个单独的值,这会导致 174 个 UNION ALL 语句,因此查询在大约 2 小时后超时。
SQL Error [08S01]: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out*
SQL 错误 [08S01]:org.apache.thrift.transport.TTransportException:java.net.SocketTimeoutException:读取超时*
Is there a better way to write this query?有没有更好的方法来编写这个查询?
Thanks!谢谢!
Hive: Hive:
ID | Description
----+------------------------------
1 | Desc1;Desc2;Desc3;Desc4
2 | Desc1;Desc3;Desc4;Desc5;Desc6
...
230 | Desc8;Desc163;Desc9;Desc2;Desc172
SQL Server: SQL 服务器:
CaseID | GroupID | Description
-------+---------+--------------
1 | 63 | Desc1
1 | 44 | Desc2
1 | 57 | Desc3
1 | 78 | Desc4
...
2 | 78 | Desc1
2 | 57 | Desc3
Query:询问:
select
case
when cas.description like '%Desc1%' then 63
end as groupid, -- maps to groupid
cas.id as caseid, -- maps to caseid
current_timestamp as INSERT_DT
from
svc_case cas
inner join account acc on acc.id = cas.id
where cas.description <> 'NULL' and LENGTH(cas.description) > 0
and acc.recordid = '03443FGT'
union all
select
case
when cas.description like '%Desc2%' then 44
end as groupid, -- maps to groupid
cas.id as caseid, -- maps to caseid
current_timestamp as INSERT_DT
from
svc_case cas
inner join account acc on acc.id = cas.id
where cas.description <> 'NULL' and LENGTH(cas.description) > 0
and acc.recordid = '03443FGT'
union all
select
case
when cas.description like '%Desc3%' then 57
end as groupid, -- maps to groupid
cas.id as caseid, -- maps to caseid
current_timestamp as INSERT_DT
from
svc_case cas
inner join account acc on acc.id = cas.id
where cas.description <> 'NULL' and LENGTH(cas.description) > 0
and acc.recordid = '03443FGT'
union all
select
case
when cas.description like '%Desc4%' then 78
end as groupid, -- maps to groupid
cas.id as caseid, -- maps to caseid
current_timestamp as INSERT_DT
from
svc_case cas
inner join account acc on acc.id = cas.id
where cas.description <> 'NULL' and LENGTH(cas.description) > 0
and acc.recordid = '03443FGT'
...
select
case
when cas.description like '%Desc175%' then 12
end as groupid, -- maps to groupid
cas.id as caseid, -- maps to caseid
current_timestamp as INSERT_DT
from
svc_case cas
inner join account acc on acc.id = cas.id
where cas.description <> 'NULL' and LENGTH(cas.description) > 0
and acc.recordid = '03443FGT'
This is a stab in the dark, but there are 2 things you can do to improve this query.这是在黑暗中的一次尝试,但是您可以做两件事来改进这个查询。 Firstly, let's address all those
UNION ALL
s.首先,让我们解决所有这些
UNION ALL
。 If I understand your query correctly, you can unpivot your data to achieve the same thing:如果我正确理解了您的查询,您可以取消数据透视以实现相同的目的:
SELECT V.groupid,
cas.id AS caseid,
current_timestamp as INSERT_DT
FROM dbo.svc_case cas
JOIN dbo.account acc on acc.id = cas.id
CROSS APPLY (VALUES(CASE WHEN cas.description LIKE '%Desc1%' THEN 63 END),
(CASE WHEN cas.description LIKE '%Desc2%' THEN 44 END),
(CASE WHEN cas.description LIKE '%Desc3%' THEN 57 END),
(CASE WHEN cas.description LIKE '%Desc4%' THEN 78 END),
--I assume there are 174 more of these
(CASE WHEN cas.description LIKE '%Desc178%' THEN 1 END))V(groupid) --The last one isn't correct, but to show how the `APPLY` ends
Then you have your WHERE
, which isn't SARGable due to the LENGTH
.然后你有你的
WHERE
,由于LENGTH
而不是 SARGable 。 LENGTH
isn't actually a T-SQL operator, so I hope you are actually using SQL Server (if you're not, this is a waste of an answer, as the above is T-SQL specific). LENGTH
实际上不是 T-SQL 运算符,所以我希望您实际上使用的是 SQL 服务器(如果不是,这是浪费答案,因为上面是特定于 T-SQL 的)。 Considering that LEN(NULL)
returns NULL
, then use <> ''
.考虑到
LEN(NULL)
返回NULL
,然后使用<> ''
。 Considering you already have <> 'NULL'
though you can use NOT IN
:考虑到你已经有
<> 'NULL'
虽然你可以使用NOT IN
:
WHERE cas.description NOT IN('NULL','')
AND acc.recordid = '03443FGT'
I do, however, suggest against storing the literal string value 'NULL'
in your column, you should fix that and actually store NULL
, not 'NULL'
;但是,我建议不要将文字字符串值
'NULL'
存储在您的列中,您应该修复它并实际存储NULL
,而不是'NULL'
; the 2 are different values and behave very differently. 2 是不同的值并且表现得非常不同。
Only run the query one time.只运行一次查询。 So no union all, and leave out the CASE.
所以没有联合,并省略了CASE。 Use a multicast and split it in SSIS.
使用多播并将其拆分为 SSIS。
You can expand the codes and use case
to convert to numbers:您可以扩展代码和用
case
以转换为数字:
select (case when code = 'Desc1' then 63
when code = 'Desc2' then 44
. . .
end) as groupid, -- maps to groupid
cas.id as caseid, -- maps to caseid
current_timestamp as INSERT_DT
from svc_case cas join
account acc
on acc.id = cas.id lateral view
explode(split(cas.description, ';')) codes as code
where acc.recordid = '03443FGT';
I don't know why you have description <> 'NULL'
.我不知道你为什么有
description <> 'NULL'
。 I am guessing that you really want is not null
-- and that is unnecessary with the lateral join.我猜你真正想要
is not null
这对于横向连接是不必要的。
Also, if you have a reference table, with one row per code and groupid
, then the code can be further simplified by joining to that.此外,如果您有一个参考表,每个代码和
groupid
一行,则可以通过加入该表来进一步简化代码。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.