简体   繁体   中英

User defined function replacing WHERE col IN(…)

I have created a user defined function to gain performance with queries containing 'WHERE col IN (...)' like this case:

SELECT myCol1, myCol2
FROM myTable
WHERE myCol3 IN (100, 200, 300, ..., 4900, 5000);

The queries are generated from an web application and are in some cases much more complex. The function definition looks like this:

CREATE FUNCTION [dbo].[udf_CSVtoIntTable]
(
  @CSV VARCHAR(MAX),
  @Delimiter CHAR(1) = ','
)
RETURNS 
@Result TABLE 
(
    [Value] INT
)
AS
BEGIN

  DECLARE @CurrStartPos SMALLINT;
  SET @CurrStartPos = 1;
  DECLARE @CurrEndPos SMALLINT;
  SET @CurrEndPos = 1;
  DECLARE @TotalLength SMALLINT;

  -- Remove space, tab, linefeed, carrier return
  SET @CSV = REPLACE(@CSV, ' ', '');
  SET @CSV = REPLACE(@CSV, CHAR(9), '');
  SET @CSV = REPLACE(@CSV, CHAR(10), '');
  SET @CSV = REPLACE(@CSV, CHAR(13), '');

  -- Add extra delimiter if needed
  IF NOT RIGHT(@CSV, 1) = @Delimiter
    SET @CSV = @CSV + @Delimiter;

  -- Get total string length 
  SET @TotalLength = LEN(@CSV);

  WHILE @CurrStartPos < @TotalLength
  BEGIN

    SET @CurrEndPos = CHARINDEX(@Delimiter, @CSV, @CurrStartPos);

    INSERT INTO @Result
    VALUES (CAST(SUBSTRING(@CSV, @CurrStartPos, @CurrEndPos - @CurrStartPos) AS INT));

    SET @CurrStartPos = @CurrEndPos + 1;

  END

    RETURN 

END

The function is intended to be used like this (or as an INNER JOIN):

SELECT myCol1, myCol2
FROM myTable
WHERE myCol3 IN (
    SELECT [Value] 
    FROM dbo.udf_CSVtoIntTable('100, 200, 300, ..., 4900, 5000', ',');

Do anyone have some optimiztion idears of my function or other ways to improve performance in my case? Is there any drawbacks that I have missed?

I am using MS SQL Server 2005 Std and .NET 2.0 framework.

我不确定性能提升,但我会将它用作内连接并远离内部select语句。

Using a UDF in a WHERE clause or (worse) a subquery is asking for trouble. The optimizer sometimes gets it right, but often gets it wrong and evaluates the function once for every row in your query, which you don't want.

If your parameters are static (they appear to be) and you can issue a multistatement batch, I'd load the results of your UDF into a table variable, then use a join against the table variable to do your filtering. This should work more reliably.

that loop will kill performance!

create a table like this:

CREATE TABLE Numbers
(
    Number  int   not null primary key
)

that has rows containing values 1 to 8000 or so and use this function:

CREATE FUNCTION [dbo].[FN_ListAllToNumberTable]
(
     @SplitOn  char(1)       --REQUIRED, the character to split the @List string on
    ,@List     varchar(8000) --REQUIRED, the list to split apart
)
RETURNS
@ParsedList table
(
    RowNumber int
   ,ListValue varchar(500)
)
AS
BEGIN

/*
DESCRIPTION: Takes the given @List string and splits it apart based on the given @SplitOn character.
             A table is returned, one row per split item, with a columns named "RowNumber" and "ListValue".
             This function workes for fixed or variable lenght items.
             Empty and null items will be included in the results set.

PARAMETERS:
    @List      varchar(8000) --REQUIRED, the list to split apart
    @SplitOn   char(1)       --OPTIONAL, the character to split the @List string on, defaults to a comma ","


RETURN VALUES:
  a table, one row per item in the list, with a column name "ListValue"

TEST WITH:
----------
SELECT * FROM dbo.FN_ListAllToNumTable(',','1,12,123,1234,54321,6,A,*,|||,,,,B')

DECLARE @InputList  varchar(200)
SET @InputList='17;184;75;495'
SELECT
    'well formed list',LEFT(@InputList,40) AS InputList,h.Name
    FROM Employee  h
        INNER JOIN dbo.FN_ListAllToNumTable(';',@InputList) dt ON h.EmployeeID=dt.ListValue
    WHERE dt.ListValue IS NOT NULL

SET @InputList='17;;;184;75;495;;;'
SELECT
    'poorly formed list join',LEFT(@InputList,40) AS InputList,h.Name
    FROM Employee  h
        INNER JOIN dbo.FN_ListAllToNumTable(';',@InputList) dt ON h.EmployeeID=dt.ListValue

SELECT
    'poorly formed list',LEFT(@InputList,40) AS InputList, ListValue
    FROM dbo.FN_ListAllToNumTable(';',@InputList)

**/



/*this will return empty rows, and row numbers*/
INSERT INTO @ParsedList
        (RowNumber,ListValue)
    SELECT
        ROW_NUMBER() OVER(ORDER BY number) AS RowNumber
            ,LTRIM(RTRIM(SUBSTRING(ListValue, number+1, CHARINDEX(@SplitOn, ListValue, number+1)-number - 1))) AS ListValue
        FROM (
                 SELECT @SplitOn + @List + @SplitOn AS ListValue
             ) AS InnerQuery
            INNER JOIN Numbers n ON n.Number < LEN(InnerQuery.ListValue)
        WHERE SUBSTRING(ListValue, number, 1) = @SplitOn

RETURN

END /*Function FN_ListAllToNumTable*/

I have other versions that do not return empty or null rows, ones that return just the item and not the row number, etc. Look in the header comment to see how to use this as part of a JOIN, which is much faster than in a where clause.

The CLR solution did not give me an good performance so I will use a recursive query. So here is the definition of the SP I will use (mostly based on Erland Sommarskogs examples):

CREATE FUNCTION [dbo].[priudf_CSVtoIntTable]
(
  @CSV VARCHAR(MAX),
  @Delimiter CHAR(1) = ','
)
RETURNS 
@Result TABLE 
(
    [Value] INT
)
AS
BEGIN

  -- Remove space, tab, linefeed, carrier return
  SET @CSV = REPLACE(@CSV, ' ', '');
  SET @CSV = REPLACE(@CSV, CHAR(9), '');
  SET @CSV = REPLACE(@CSV, CHAR(10), '');
  SET @CSV = REPLACE(@CSV, CHAR(13), '');

  WITH csvtbl(start, stop) AS 
  (
    SELECT  start = CONVERT(BIGINT, 1),
            stop = CHARINDEX(@Delimiter, @CSV + @Delimiter)
    UNION ALL
    SELECT  start = stop + 1,
            stop = CHARINDEX(@Delimiter, @CSV + @Delimiter, stop + 1)
    FROM csvtbl
    WHERE stop > 0
  )
  INSERT INTO @Result
  SELECT CAST(SUBSTRING(@CSV, start, CASE WHEN stop > 0 THEN stop - start ELSE 0 END) AS INT) AS [Value]
  FROM   csvtbl
  WHERE  stop > 0
  OPTION (MAXRECURSION 1000)

  RETURN 
END

Thank for the input, I have to admit that I have made som bad research before I started my work. I found that Erland Sommarskog has written a lot of this problem on his webpage, after your responeses and after reading his page I decided that I will try to make a CLR to solve this.

I tried a recursive query, this resulted in good performance but I will try CLR function anyway.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM