简体   繁体   中英

SQL Server query for SSIS transformation timing out due to 174 UNION ALL statements

I have a table in Hive and SQL Server with data stored as below. I am using SSIS to move this data in to SQL Server. The query is taking too long. There are about 175 separate values in the Description column, which results in 174 UNION ALL statements due to which the query times out after about 2 hours.

SQL Error [08S01]: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out*

Is there a better way to write this query?

Thanks!

Hive:

ID  | Description
----+------------------------------
 1  | Desc1;Desc2;Desc3;Desc4
 2  | Desc1;Desc3;Desc4;Desc5;Desc6
 ...
230 | Desc8;Desc163;Desc9;Desc2;Desc172

SQL Server:

CaseID | GroupID | Description
-------+---------+--------------
   1   |    63   | Desc1
   1   |    44   | Desc2
   1   |    57   | Desc3
   1   |    78   | Desc4
   ...
   2   |    78   | Desc1
   2   |    57   | Desc3

Query:

select 
       case 
             when cas.description like '%Desc1%' then 63 
       end as groupid, -- maps to groupid
       cas.id as caseid, -- maps to caseid 
       current_timestamp as INSERT_DT
from 
       svc_case cas
inner join account acc on acc.id = cas.id
where cas.description <> 'NULL' and LENGTH(cas.description) > 0
and acc.recordid = '03443FGT'
union all 
select 
       case 
             when cas.description like '%Desc2%' then 44
       end as groupid, -- maps to groupid
       cas.id as caseid, -- maps to caseid 
       current_timestamp as INSERT_DT
from 
       svc_case cas
inner join account acc on acc.id = cas.id
where cas.description <> 'NULL' and LENGTH(cas.description) > 0
and acc.recordid = '03443FGT'
union all
select 
       case 
             when cas.description like '%Desc3%' then 57 
       end as groupid, -- maps to groupid
       cas.id as caseid, -- maps to caseid 
       current_timestamp as INSERT_DT
from 
       svc_case cas
inner join account acc on acc.id = cas.id
where cas.description <> 'NULL' and LENGTH(cas.description) > 0
and acc.recordid = '03443FGT'
union all
select 
       case 
             when cas.description like '%Desc4%' then 78 
       end as groupid, -- maps to groupid
       cas.id as caseid, -- maps to caseid 
       current_timestamp as INSERT_DT
from 
       svc_case cas
inner join account acc on acc.id = cas.id
where cas.description <> 'NULL' and LENGTH(cas.description) > 0
and acc.recordid = '03443FGT'
...
select 
       case 
             when cas.description like '%Desc175%' then 12 
       end as groupid, -- maps to groupid
       cas.id as caseid, -- maps to caseid 
       current_timestamp as INSERT_DT
from 
       svc_case cas
inner join account acc on acc.id = cas.id
where cas.description <> 'NULL' and LENGTH(cas.description) > 0
and acc.recordid = '03443FGT'

This is a stab in the dark, but there are 2 things you can do to improve this query. Firstly, let's address all those UNION ALL s. If I understand your query correctly, you can unpivot your data to achieve the same thing:

SELECT V.groupid,
       cas.id AS caseid,
       current_timestamp as INSERT_DT
FROM dbo.svc_case cas
     JOIN dbo.account acc on acc.id = cas.id
     CROSS APPLY (VALUES(CASE WHEN cas.description LIKE '%Desc1%' THEN 63 END),
                        (CASE WHEN cas.description LIKE '%Desc2%' THEN 44 END),
                        (CASE WHEN cas.description LIKE '%Desc3%' THEN 57 END),
                        (CASE WHEN cas.description LIKE '%Desc4%' THEN 78 END),
                        --I assume there are 174 more of these
                        (CASE WHEN cas.description LIKE '%Desc178%' THEN 1 END))V(groupid) --The last one isn't correct, but to show how the `APPLY` ends

Then you have your WHERE , which isn't SARGable due to the LENGTH . LENGTH isn't actually a T-SQL operator, so I hope you are actually using SQL Server (if you're not, this is a waste of an answer, as the above is T-SQL specific). Considering that LEN(NULL) returns NULL , then use <> '' . Considering you already have <> 'NULL' though you can use NOT IN :

WHERE cas.description NOT IN('NULL','')
  AND acc.recordid = '03443FGT'

I do, however, suggest against storing the literal string value 'NULL' in your column, you should fix that and actually store NULL , not 'NULL' ; the 2 are different values and behave very differently.

Only run the query one time. So no union all, and leave out the CASE. Use a multicast and split it in SSIS.

You can expand the codes and use case to convert to numbers:

select (case when code = 'Desc1' then 63
             when code = 'Desc2' then 44
             . . .
        end) as groupid, -- maps to groupid
       cas.id as caseid, -- maps to caseid 
       current_timestamp as INSERT_DT
from svc_case cas join
     account acc
     on acc.id = cas.id lateral view
     explode(split(cas.description, ';')) codes as code
where acc.recordid = '03443FGT';

I don't know why you have description <> 'NULL' . I am guessing that you really want is not null -- and that is unnecessary with the lateral join.

Also, if you have a reference table, with one row per code and groupid , then the code can be further simplified by joining to that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM