I have a table in Hive and SQL Server with data stored as below. I am using SSIS to move this data in to SQL Server. The query is taking too long. There are about 175 separate values in the Description column, which results in 174 UNION ALL statements due to which the query times out after about 2 hours.
SQL Error [08S01]: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out*
Is there a better way to write this query?
Thanks!
Hive:
ID | Description
----+------------------------------
1 | Desc1;Desc2;Desc3;Desc4
2 | Desc1;Desc3;Desc4;Desc5;Desc6
...
230 | Desc8;Desc163;Desc9;Desc2;Desc172
SQL Server:
CaseID | GroupID | Description
-------+---------+--------------
1 | 63 | Desc1
1 | 44 | Desc2
1 | 57 | Desc3
1 | 78 | Desc4
...
2 | 78 | Desc1
2 | 57 | Desc3
Query:
select
case
when cas.description like '%Desc1%' then 63
end as groupid, -- maps to groupid
cas.id as caseid, -- maps to caseid
current_timestamp as INSERT_DT
from
svc_case cas
inner join account acc on acc.id = cas.id
where cas.description <> 'NULL' and LENGTH(cas.description) > 0
and acc.recordid = '03443FGT'
union all
select
case
when cas.description like '%Desc2%' then 44
end as groupid, -- maps to groupid
cas.id as caseid, -- maps to caseid
current_timestamp as INSERT_DT
from
svc_case cas
inner join account acc on acc.id = cas.id
where cas.description <> 'NULL' and LENGTH(cas.description) > 0
and acc.recordid = '03443FGT'
union all
select
case
when cas.description like '%Desc3%' then 57
end as groupid, -- maps to groupid
cas.id as caseid, -- maps to caseid
current_timestamp as INSERT_DT
from
svc_case cas
inner join account acc on acc.id = cas.id
where cas.description <> 'NULL' and LENGTH(cas.description) > 0
and acc.recordid = '03443FGT'
union all
select
case
when cas.description like '%Desc4%' then 78
end as groupid, -- maps to groupid
cas.id as caseid, -- maps to caseid
current_timestamp as INSERT_DT
from
svc_case cas
inner join account acc on acc.id = cas.id
where cas.description <> 'NULL' and LENGTH(cas.description) > 0
and acc.recordid = '03443FGT'
...
select
case
when cas.description like '%Desc175%' then 12
end as groupid, -- maps to groupid
cas.id as caseid, -- maps to caseid
current_timestamp as INSERT_DT
from
svc_case cas
inner join account acc on acc.id = cas.id
where cas.description <> 'NULL' and LENGTH(cas.description) > 0
and acc.recordid = '03443FGT'
This is a stab in the dark, but there are 2 things you can do to improve this query. Firstly, let's address all those UNION ALL
s. If I understand your query correctly, you can unpivot your data to achieve the same thing:
SELECT V.groupid,
cas.id AS caseid,
current_timestamp as INSERT_DT
FROM dbo.svc_case cas
JOIN dbo.account acc on acc.id = cas.id
CROSS APPLY (VALUES(CASE WHEN cas.description LIKE '%Desc1%' THEN 63 END),
(CASE WHEN cas.description LIKE '%Desc2%' THEN 44 END),
(CASE WHEN cas.description LIKE '%Desc3%' THEN 57 END),
(CASE WHEN cas.description LIKE '%Desc4%' THEN 78 END),
--I assume there are 174 more of these
(CASE WHEN cas.description LIKE '%Desc178%' THEN 1 END))V(groupid) --The last one isn't correct, but to show how the `APPLY` ends
Then you have your WHERE
, which isn't SARGable due to the LENGTH
. LENGTH
isn't actually a T-SQL operator, so I hope you are actually using SQL Server (if you're not, this is a waste of an answer, as the above is T-SQL specific). Considering that LEN(NULL)
returns NULL
, then use <> ''
. Considering you already have <> 'NULL'
though you can use NOT IN
:
WHERE cas.description NOT IN('NULL','')
AND acc.recordid = '03443FGT'
I do, however, suggest against storing the literal string value 'NULL'
in your column, you should fix that and actually store NULL
, not 'NULL'
; the 2 are different values and behave very differently.
Only run the query one time. So no union all, and leave out the CASE. Use a multicast and split it in SSIS.
You can expand the codes and use case
to convert to numbers:
select (case when code = 'Desc1' then 63
when code = 'Desc2' then 44
. . .
end) as groupid, -- maps to groupid
cas.id as caseid, -- maps to caseid
current_timestamp as INSERT_DT
from svc_case cas join
account acc
on acc.id = cas.id lateral view
explode(split(cas.description, ';')) codes as code
where acc.recordid = '03443FGT';
I don't know why you have description <> 'NULL'
. I am guessing that you really want is not null
-- and that is unnecessary with the lateral join.
Also, if you have a reference table, with one row per code and groupid
, then the code can be further simplified by joining to that.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.