用户定义的函数替换WHERE col IN（...）

Question

I have created a user defined function to gain performance with queries containing 'WHERE col IN (...)' like this case: 我创建了一个用户定义的函数来获得包含'WHERE col IN（...）'的查询的性能，就像这样：

SELECT myCol1, myCol2
FROM myTable
WHERE myCol3 IN (100, 200, 300, ..., 4900, 5000);

The queries are generated from an web application and are in some cases much more complex. 查询是从Web应用程序生成的，在某些情况下要复杂得多。 The function definition looks like this: 函数定义如下所示：

CREATE FUNCTION [dbo].[udf_CSVtoIntTable]
(
  @CSV VARCHAR(MAX),
  @Delimiter CHAR(1) = ','
)
RETURNS 
@Result TABLE 
(
    [Value] INT
)
AS
BEGIN

  DECLARE @CurrStartPos SMALLINT;
  SET @CurrStartPos = 1;
  DECLARE @CurrEndPos SMALLINT;
  SET @CurrEndPos = 1;
  DECLARE @TotalLength SMALLINT;

  -- Remove space, tab, linefeed, carrier return
  SET @CSV = REPLACE(@CSV, ' ', '');
  SET @CSV = REPLACE(@CSV, CHAR(9), '');
  SET @CSV = REPLACE(@CSV, CHAR(10), '');
  SET @CSV = REPLACE(@CSV, CHAR(13), '');

  -- Add extra delimiter if needed
  IF NOT RIGHT(@CSV, 1) = @Delimiter
    SET @CSV = @CSV + @Delimiter;

  -- Get total string length 
  SET @TotalLength = LEN(@CSV);

  WHILE @CurrStartPos < @TotalLength
  BEGIN

    SET @CurrEndPos = CHARINDEX(@Delimiter, @CSV, @CurrStartPos);

    INSERT INTO @Result
    VALUES (CAST(SUBSTRING(@CSV, @CurrStartPos, @CurrEndPos - @CurrStartPos) AS INT));

    SET @CurrStartPos = @CurrEndPos + 1;

  END

    RETURN 

END

The function is intended to be used like this (or as an INNER JOIN): 该函数旨在像这样使用（或作为INNER JOIN）：

SELECT myCol1, myCol2
FROM myTable
WHERE myCol3 IN (
    SELECT [Value] 
    FROM dbo.udf_CSVtoIntTable('100, 200, 300, ..., 4900, 5000', ',');

Do anyone have some optimiztion idears of my function or other ways to improve performance in my case? 在我的情况下，有没有人对我的功能或其他改善性能的方法有一些优化的想法？ Is there any drawbacks that I have missed? 有没有我错过的缺点？

I am using MS SQL Server 2005 Std and .NET 2.0 framework. 我正在使用MS SQL Server 2005 Std和.NET 2.0框架。

Answer 1

我不确定性能提升，但我会将它用作内连接并远离内部select语句。

Answer 2

Using a UDF in a WHERE clause or (worse) a subquery is asking for trouble. 在WHERE子句中使用UDF或（更糟糕的是）子查询要求麻烦。 The optimizer sometimes gets it right, but often gets it wrong and evaluates the function once for every row in your query, which you don't want. 优化器有时会正确使用它，但经常会出错，并为查询中的每一行（对于您不想要的那一行）计算一次函数。

If your parameters are static (they appear to be) and you can issue a multistatement batch, I'd load the results of your UDF into a table variable, then use a join against the table variable to do your filtering. 如果您的参数是静态的（它们看起来像是）并且您可以发出多语句批处理，我会将UDF的结果加载到表变量中，然后对表变量使用连接来进行过滤。 This should work more reliably. 这应该更可靠。

Answer 3

that loop will kill performance! 那个循环会杀死性能！

create a table like this: 创建一个这样的表：

CREATE TABLE Numbers
(
    Number  int   not null primary key
)

that has rows containing values 1 to 8000 or so and use this function: 包含值为1到8000左右的行并使用此函数：

CREATE FUNCTION [dbo].[FN_ListAllToNumberTable]
(
     @SplitOn  char(1)       --REQUIRED, the character to split the @List string on
    ,@List     varchar(8000) --REQUIRED, the list to split apart
)
RETURNS
@ParsedList table
(
    RowNumber int
   ,ListValue varchar(500)
)
AS
BEGIN

/*
DESCRIPTION: Takes the given @List string and splits it apart based on the given @SplitOn character.
             A table is returned, one row per split item, with a columns named "RowNumber" and "ListValue".
             This function workes for fixed or variable lenght items.
             Empty and null items will be included in the results set.

PARAMETERS:
    @List      varchar(8000) --REQUIRED, the list to split apart
    @SplitOn   char(1)       --OPTIONAL, the character to split the @List string on, defaults to a comma ","


RETURN VALUES:
  a table, one row per item in the list, with a column name "ListValue"

TEST WITH:
----------
SELECT * FROM dbo.FN_ListAllToNumTable(',','1,12,123,1234,54321,6,A,*,|||,,,,B')

DECLARE @InputList  varchar(200)
SET @InputList='17;184;75;495'
SELECT
    'well formed list',LEFT(@InputList,40) AS InputList,h.Name
    FROM Employee  h
        INNER JOIN dbo.FN_ListAllToNumTable(';',@InputList) dt ON h.EmployeeID=dt.ListValue
    WHERE dt.ListValue IS NOT NULL

SET @InputList='17;;;184;75;495;;;'
SELECT
    'poorly formed list join',LEFT(@InputList,40) AS InputList,h.Name
    FROM Employee  h
        INNER JOIN dbo.FN_ListAllToNumTable(';',@InputList) dt ON h.EmployeeID=dt.ListValue

SELECT
    'poorly formed list',LEFT(@InputList,40) AS InputList, ListValue
    FROM dbo.FN_ListAllToNumTable(';',@InputList)

**/



/*this will return empty rows, and row numbers*/
INSERT INTO @ParsedList
        (RowNumber,ListValue)
    SELECT
        ROW_NUMBER() OVER(ORDER BY number) AS RowNumber
            ,LTRIM(RTRIM(SUBSTRING(ListValue, number+1, CHARINDEX(@SplitOn, ListValue, number+1)-number - 1))) AS ListValue
        FROM (
                 SELECT @SplitOn + @List + @SplitOn AS ListValue
             ) AS InnerQuery
            INNER JOIN Numbers n ON n.Number < LEN(InnerQuery.ListValue)
        WHERE SUBSTRING(ListValue, number, 1) = @SplitOn

RETURN

END /*Function FN_ListAllToNumTable*/

I have other versions that do not return empty or null rows, ones that return just the item and not the row number, etc. Look in the header comment to see how to use this as part of a JOIN, which is much faster than in a where clause. 我有其他版本不返回空行或空行，只返回项而不返回行号等。查看标题注释，看看如何使用它作为JOIN的一部分，这比在一个where子句。

Answer 4

The CLR solution did not give me an good performance so I will use a recursive query. CLR解决方案没有给我一个良好的性能，所以我将使用递归查询。 So here is the definition of the SP I will use (mostly based on Erland Sommarskogs examples): 所以这里是我将使用的SP的定义（主要基于Erland Sommarskogs的例子）：

CREATE FUNCTION [dbo].[priudf_CSVtoIntTable]
(
  @CSV VARCHAR(MAX),
  @Delimiter CHAR(1) = ','
)
RETURNS 
@Result TABLE 
(
    [Value] INT
)
AS
BEGIN

  -- Remove space, tab, linefeed, carrier return
  SET @CSV = REPLACE(@CSV, ' ', '');
  SET @CSV = REPLACE(@CSV, CHAR(9), '');
  SET @CSV = REPLACE(@CSV, CHAR(10), '');
  SET @CSV = REPLACE(@CSV, CHAR(13), '');

  WITH csvtbl(start, stop) AS 
  (
    SELECT  start = CONVERT(BIGINT, 1),
            stop = CHARINDEX(@Delimiter, @CSV + @Delimiter)
    UNION ALL
    SELECT  start = stop + 1,
            stop = CHARINDEX(@Delimiter, @CSV + @Delimiter, stop + 1)
    FROM csvtbl
    WHERE stop > 0
  )
  INSERT INTO @Result
  SELECT CAST(SUBSTRING(@CSV, start, CASE WHEN stop > 0 THEN stop - start ELSE 0 END) AS INT) AS [Value]
  FROM   csvtbl
  WHERE  stop > 0
  OPTION (MAXRECURSION 1000)

  RETURN 
END

Answer 5

Thank for the input, I have to admit that I have made som bad research before I started my work. 感谢您的投入，我不得不承认，在开始工作之前，我做了一些糟糕的研究。 I found that Erland Sommarskog has written a lot of this problem on his webpage, after your responeses and after reading his page I decided that I will try to make a CLR to solve this. 我发现Erland Sommarskog在他的网页上写了很多这个问题，在你的回复之后，在读完他的页面之后，我决定尝试制作一个CLR来解决这个问题。

I tried a recursive query, this resulted in good performance but I will try CLR function anyway. 我尝试了一个递归查询，这导致了良好的性能，但无论如何我会尝试CLR功能。

用户定义的函数替换WHERE col IN（...）

问题描述

5 个解决方案

解决方案1
1 2009-03-26 12:32:24

解决方案2
1 2009-03-26 12:53:58

解决方案3
1 2009-03-26 13:54:17

解决方案4
1 已采纳 2009-04-16 10:34:33

解决方案5
0 2009-03-27 11:13:54

用户定义的函数替换WHERE col IN（...）

问题描述

5 个解决方案

解决方案1 1 2009-03-26 12:32:24

解决方案2 1 2009-03-26 12:53:58

解决方案3 1 2009-03-26 13:54:17

解决方案4 1 已采纳 2009-04-16 10:34:33

解决方案5 0 2009-03-27 11:13:54

解决方案1
1 2009-03-26 12:32:24

解决方案2
1 2009-03-26 12:53:58

解决方案3
1 2009-03-26 13:54:17

解决方案4
1 已采纳 2009-04-16 10:34:33

解决方案5
0 2009-03-27 11:13:54