简体   繁体   中英

T-SQL UDF vs full expression run-time

I'm trying to make my query readable by using UDF in SQL SERVER but the run time increasing dramatically when using the function.

Following is the function I'm using:

create function DL.trim_all(@input varchar(max)) 
returns varchar(max)
as begin 
    set @input=replace(replace(replace(@input,' ',''),')',''),'(','')
    return @input
end

Instead of writing:

SELECT
CASE WHEN replace(replace(replace([FULL_NAME_1],' ',''),')',''),'(','')=replace(replace(replace([FULL_NAME_2],' ',''),')',''),'(','') THEN 1 ELSE 0 END AS [name_match],
CASE WHEN replace(replace(replace([ADDRESS_1],' ',''),')',''),'(','')=replace(replace(replace([ADDRESS_2],' ',''),')',''),'(','') THEN 1 ELSE 0 END AS [adrs_match]
.
.
.
FROM
TABLE_1

for 20 different fields.

When using the function I'm getting run-time of 12.5 minutes while run-time of 45 seconds when not using the function.

Any ideas?

Taking John's idea one step further, converting the scalar function into an inline table function and using cross apply to activate it for each pair of columns - you might get an even better performance, for the price of a more cumbersome query:

CREATE function DL.DoesItMatch(@s1 varchar(500),@s2 varchar(500)) 
returns table -- returns a table with a single row and a single column
as return 
  SELECT 
    CASE WHEN replace(replace(replace(@s1,' ',''),')',''),'(','') = 
              replace(replace(replace(@s2,' ',''),')',''),'(','') THEN 1 ELSE 0 END As IsMatch;    

and the query:

SELECT NameMatch.IsMatch AS [name_match],
       AddressMatch.IsMatch AS adrs_match
.
.
.
FROM TABLE_1
CROSS APPLY DL.DoesItMatch(FULL_NAME_1, FULL_NAME_2) As NameMatch
CROSS APPLY DL.DoesItMatch(ADDRESS_1, ADDRESS_2) As AddressMatch

Can't imagine a huge boost, but how about an alternate approach

create function DL.DoesItMatch(@s1 varchar(500),@s2 varchar(500)) 
returns bit
as begin 
    return CASE WHEN replace(replace(replace(@s1,' ',''),')',''),'(','')=replace(replace(replace(@s2,' ',''),')',''),'(','') THEN 1 ELSE 0 END
end

Then call the function as:

SELECT 
      DL.DoesItMatch([FULL_NAME_1],[FULL_NAME_2])  AS [name_match],
      ...
FROM
TABLE_1

Inlining is always the way to go. Period. Even without considering the parallelism inhibiting aspects of T-SQL scalar UDFs - ITVFs are faster, require less resources (CPU, Memory and IO), easier to maintain and easier troubleshoot/analyze/profile/trace. For fun I put together a performance test comparing Zohar's ITVF to John's scalar UDF. I created 250K rows, tested a basic select against both, then another test with an ORDER BY against the heap to force a sort.

Sample data:

-- Sample Data
BEGIN
  SET NOCOUNT ON;
  IF OBJECT_ID('tempdb..#tmp','U') IS NOT NULL DROP TABLE #tmp;
  SELECT TOP (250000) col1 = '('+LEFT(NEWID(),10)+')', col2 = '('+LEFT(NEWID(),10)+')'
  INTO    #tmp
  FROM   sys.all_columns a, sys.all_columns;

  UPDATE #tmp SET col1 = col2 WHERE LEFT(col1,2) = LEFT(col2,2) 
END

Performance Test:

PRINT 'scalar, no sort'+CHAR(10)+REPLICATE('-',60);
GO
DECLARE @st DATETIME = GETDATE(), @isMatch BIT;
  SELECT @isMatch = DL.DoesItMatch(t.col1,t.col2)
  FROM   #tmp AS t;
PRINT DATEDIFF(MS,@st,GETDATE())
GO 3

PRINT CHAR(10)+'ITVF, no sort'+CHAR(10)+REPLICATE('-',60);
GO
DECLARE @st DATETIME = GETDATE(), @isMatch BIT;
  SELECT      @isMatch = f.isMatch
  FROM        #tmp AS t
  CROSS APPLY DL.DoesItMatch_ITVF(t.col1,t.col2) AS f;
PRINT DATEDIFF(MS,@st,GETDATE())
GO 3    

PRINT CHAR(10)+'scalar, sorted set'+CHAR(10)+REPLICATE('-',60);
GO
DECLARE @st DATETIME = GETDATE(), @isMatch BIT;
  SELECT @isMatch = DL.DoesItMatch(t.col1,t.col2)
  FROM   #tmp AS t
  ORDER BY DL.DoesItMatch(t.col1,t.col2);
PRINT DATEDIFF(MS,@st,GETDATE())
GO 3

PRINT CHAR(10)+'ITVF, sorted set'+CHAR(10)+REPLICATE('-',60);
GO
DECLARE @st DATETIME = GETDATE(), @isMatch BIT;
  SELECT      @isMatch = f.isMatch
  FROM        #tmp AS t
  CROSS APPLY DL.DoesItMatch_ITVF(t.col1,t.col2) AS f
  ORDER BY    f.isMatch;
PRINT DATEDIFF(MS,@st,GETDATE())
GO 3

Test Results:

scalar, no sort
------------------------------------------------------------
Beginning execution loop
844
843
840
Batch execution completed 3 times.

ITVF, no sort
------------------------------------------------------------
Beginning execution loop
270
270
270
Batch execution completed 3 times.

scalar, sorted set
------------------------------------------------------------
Beginning execution loop
937
930
936
Batch execution completed 3 times.

ITVF, sorted set
------------------------------------------------------------
Beginning execution loop
196
190
190
Batch execution completed 3 times.

So, when no parallel plan is needed, the ITVF is 3X faster, when a parallel plan is required it's 5X faster. Here's a few other links where I have tested ITVF vs (scalar and Multistatement Table Valued UDFs).

Set based plan runs slower than scalar valued function with many conditions

SQL Server user defined function to calculate age bracket

Function is slow but query runs fast

Why does SQL Server say this function is nondeterministic?

Grouping based on the match percentage

SQL Server 2008 user defined function to add spaces between each digit Sql table comma separated values contain any of variable values checking

SQL String manipulation, find all permutations

You could use Scalar UDF inlining in SQL Server 2019. With that, you will be able to retain the same UDF that you have written, and automatically get the performance identical to the query without the UDF.

The UDF you have given fits the criteria for inlineability so you are in good shape. Documentation about the UDF inlining feature is here: https://docs.microsoft.com/en-us/sql/relational-databases/user-defined-functions/scalar-udf-inlining?view=azuresqldb-current

Pro tip: I'd suggest that you make a make a minor modification to your UDF before using Scalar UDF inlining. Make it into a single statement scalar UDF by avoiding the local variable. With this, you should be better off than using an inline TVF with cross apply.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM