简体   繁体   English

T-SQL UDF与完整表达式运行时

[英]T-SQL UDF vs full expression run-time

I'm trying to make my query readable by using UDF in SQL SERVER but the run time increasing dramatically when using the function. 我试图通过在SQL SERVER中使用UDF来使查询可读,但是使用该函数时运行时间会急剧增加。

Following is the function I'm using: 以下是我正在使用的功能:

create function DL.trim_all(@input varchar(max)) 
returns varchar(max)
as begin 
    set @input=replace(replace(replace(@input,' ',''),')',''),'(','')
    return @input
end

Instead of writing: 而不是写:

SELECT
CASE WHEN replace(replace(replace([FULL_NAME_1],' ',''),')',''),'(','')=replace(replace(replace([FULL_NAME_2],' ',''),')',''),'(','') THEN 1 ELSE 0 END AS [name_match],
CASE WHEN replace(replace(replace([ADDRESS_1],' ',''),')',''),'(','')=replace(replace(replace([ADDRESS_2],' ',''),')',''),'(','') THEN 1 ELSE 0 END AS [adrs_match]
.
.
.
FROM
TABLE_1

for 20 different fields. 适用于20个不同领域。

When using the function I'm getting run-time of 12.5 minutes while run-time of 45 seconds when not using the function. 使用该功能时,运行时间为12.5分钟,而当不使用该功能时,运行时间为45秒。

Any ideas? 有任何想法吗?

Taking John's idea one step further, converting the scalar function into an inline table function and using cross apply to activate it for each pair of columns - you might get an even better performance, for the price of a more cumbersome query: 将John的想法更进一步,将标量函数转换为内联表函数,并使用交叉应用为每对列激活它-您可能会获得更好的性能,但代价是查询更加麻烦:

CREATE function DL.DoesItMatch(@s1 varchar(500),@s2 varchar(500)) 
returns table -- returns a table with a single row and a single column
as return 
  SELECT 
    CASE WHEN replace(replace(replace(@s1,' ',''),')',''),'(','') = 
              replace(replace(replace(@s2,' ',''),')',''),'(','') THEN 1 ELSE 0 END As IsMatch;    

and the query: 和查询:

SELECT NameMatch.IsMatch AS [name_match],
       AddressMatch.IsMatch AS adrs_match
.
.
.
FROM TABLE_1
CROSS APPLY DL.DoesItMatch(FULL_NAME_1, FULL_NAME_2) As NameMatch
CROSS APPLY DL.DoesItMatch(ADDRESS_1, ADDRESS_2) As AddressMatch

Can't imagine a huge boost, but how about an alternate approach 无法想象巨大的提升,但是另一种方法呢

create function DL.DoesItMatch(@s1 varchar(500),@s2 varchar(500)) 
returns bit
as begin 
    return CASE WHEN replace(replace(replace(@s1,' ',''),')',''),'(','')=replace(replace(replace(@s2,' ',''),')',''),'(','') THEN 1 ELSE 0 END
end

Then call the function as: 然后将函数调用为:

SELECT 
      DL.DoesItMatch([FULL_NAME_1],[FULL_NAME_2])  AS [name_match],
      ...
FROM
TABLE_1

Inlining is always the way to go. 内联始终是要走的路。 Period. 期。 Even without considering the parallelism inhibiting aspects of T-SQL scalar UDFs - ITVFs are faster, require less resources (CPU, Memory and IO), easier to maintain and easier troubleshoot/analyze/profile/trace. 即使不考虑限制并行性的T-SQL标量UDF的方面-ITVF速度更快,所需资源(CPU,内存和IO)更少,易于维护,并且更易于故障排除/分析/配置文件/跟踪。 For fun I put together a performance test comparing Zohar's ITVF to John's scalar UDF. 为了好玩,我进行了一项性能测试,将Zohar的ITVF与John的标量UDF进行了比较。 I created 250K rows, tested a basic select against both, then another test with an ORDER BY against the heap to force a sort. 我创建了25万行,针对两者都测试了一个基本选择,然后对堆进行了另一个ORDER BY测试以强制排序。

Sample data: 样本数据:

-- Sample Data
BEGIN
  SET NOCOUNT ON;
  IF OBJECT_ID('tempdb..#tmp','U') IS NOT NULL DROP TABLE #tmp;
  SELECT TOP (250000) col1 = '('+LEFT(NEWID(),10)+')', col2 = '('+LEFT(NEWID(),10)+')'
  INTO    #tmp
  FROM   sys.all_columns a, sys.all_columns;

  UPDATE #tmp SET col1 = col2 WHERE LEFT(col1,2) = LEFT(col2,2) 
END

Performance Test: 性能测试:

PRINT 'scalar, no sort'+CHAR(10)+REPLICATE('-',60);
GO
DECLARE @st DATETIME = GETDATE(), @isMatch BIT;
  SELECT @isMatch = DL.DoesItMatch(t.col1,t.col2)
  FROM   #tmp AS t;
PRINT DATEDIFF(MS,@st,GETDATE())
GO 3

PRINT CHAR(10)+'ITVF, no sort'+CHAR(10)+REPLICATE('-',60);
GO
DECLARE @st DATETIME = GETDATE(), @isMatch BIT;
  SELECT      @isMatch = f.isMatch
  FROM        #tmp AS t
  CROSS APPLY DL.DoesItMatch_ITVF(t.col1,t.col2) AS f;
PRINT DATEDIFF(MS,@st,GETDATE())
GO 3    

PRINT CHAR(10)+'scalar, sorted set'+CHAR(10)+REPLICATE('-',60);
GO
DECLARE @st DATETIME = GETDATE(), @isMatch BIT;
  SELECT @isMatch = DL.DoesItMatch(t.col1,t.col2)
  FROM   #tmp AS t
  ORDER BY DL.DoesItMatch(t.col1,t.col2);
PRINT DATEDIFF(MS,@st,GETDATE())
GO 3

PRINT CHAR(10)+'ITVF, sorted set'+CHAR(10)+REPLICATE('-',60);
GO
DECLARE @st DATETIME = GETDATE(), @isMatch BIT;
  SELECT      @isMatch = f.isMatch
  FROM        #tmp AS t
  CROSS APPLY DL.DoesItMatch_ITVF(t.col1,t.col2) AS f
  ORDER BY    f.isMatch;
PRINT DATEDIFF(MS,@st,GETDATE())
GO 3

Test Results: 检测结果:

scalar, no sort
------------------------------------------------------------
Beginning execution loop
844
843
840
Batch execution completed 3 times.

ITVF, no sort
------------------------------------------------------------
Beginning execution loop
270
270
270
Batch execution completed 3 times.

scalar, sorted set
------------------------------------------------------------
Beginning execution loop
937
930
936
Batch execution completed 3 times.

ITVF, sorted set
------------------------------------------------------------
Beginning execution loop
196
190
190
Batch execution completed 3 times.

So, when no parallel plan is needed, the ITVF is 3X faster, when a parallel plan is required it's 5X faster. 因此,当不需要并行计划时,ITVF快3倍,而需要并行计划时,ITVF快5倍。 Here's a few other links where I have tested ITVF vs (scalar and Multistatement Table Valued UDFs). 这是我测试ITVF与(标量和多语句表值UDF)的其他一些链接。

Set based plan runs slower than scalar valued function with many conditions 在许多情况下,基于集合的计划的运行速度比标量值函数慢

SQL Server user defined function to calculate age bracket SQL Server用户定义函数来计算年龄段

Function is slow but query runs fast 功能慢,但查询运行快

Why does SQL Server say this function is nondeterministic? 为什么SQL Server说此函数是不确定的?

Grouping based on the match percentage 根据匹配百分比进行分组

SQL Server 2008 user defined function to add spaces between each digit Sql table comma separated values contain any of variable values checking SQL Server 2008用户定义函数在每个数字之间添加空格 Sql表逗号分隔的值包含任何变量值检查

SQL String manipulation, find all permutations SQL字符串操作,查找所有排列

You could use Scalar UDF inlining in SQL Server 2019. With that, you will be able to retain the same UDF that you have written, and automatically get the performance identical to the query without the UDF. 您可以在SQL Server 2019中使用Scalar UDF内联。这样,您将能够保留您编写的相同UDF,并自动获得与没有UDF的查询相同的性能。

The UDF you have given fits the criteria for inlineability so you are in good shape. 您提供的UDF符合可嵌入性的标准,因此您的身体状况良好。 Documentation about the UDF inlining feature is here: https://docs.microsoft.com/en-us/sql/relational-databases/user-defined-functions/scalar-udf-inlining?view=azuresqldb-current 有关UDF内联功能的文档位于: https ://docs.microsoft.com/zh-cn/sql/relational-databases/user-defined-functions/scalar-udf-inlining?view = azuresqldb-current

Pro tip: I'd suggest that you make a make a minor modification to your UDF before using Scalar UDF inlining. 专家提示:建议您在使用Scalar UDF内联之前,对UDF进行较小的修改。 Make it into a single statement scalar UDF by avoiding the local variable. 通过避免局部变量,使其成为单个语句标量UDF。 With this, you should be better off than using an inline TVF with cross apply. 这样,您比使用带有交叉应用的嵌入式TVF更好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM