简体   繁体   English

CLR sql server性能

[英]CLR sql server performance

We are using CLR-functions in our ETL-processes to have specific data-conversion and data-checking logic centralized. 我们在ETL过程中使用CLR函数来集中特定的数据转换和数据检查逻辑。 These functions are rather basic and require no data-access and are deterministic therefor allowing parallellism. 这些功能相当基本,不需要数据访问,因此具有确定性,允许并行。

For instance: 例如:

[SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true, SystemDataAccess = SystemDataAccessKind.None, IsPrecise = true)]
public static bool check_smallint(string input)
{
    string teststring;
    try
    {
        teststring = input.Trim(' ').Replace('-', '0');
        if (teststring.Length == 0)
        {
            teststring = "0";
        }
        Convert.ToInt16(teststring);
    }
    catch (NullReferenceException)
    {
        return true;
    }
    catch (FormatException)
    {
        return false;
    }
    catch (OverflowException)
    {
        return false;
    }
    return true;
}

This works fine except for performance. 除性能外,这种方法很好。 Query's have slowed down considerably, wihich is causing trouble in processing large datasets (millions of rows and more). 查询的速度已大大减慢,这会导致处理大型数据集(数百万行甚至更多)时出现问题。

Until now we have found no one who really understands the SQL CLR-architecture, but one suggestion we received is that it might be caused by the overhead of creating a new connection or allocating memory for every function-call. 到目前为止,我们还没有人真正了解SQL CLR体系结构,但是我们收到的一个建议是,它可能是由创建新连接或为每个函数调用分配内存的开销引起的。 So a solution could be connection / memory pooling. 所以解决方案可能是连接/内存池。

Please don't suggest different solutions, we are already considering them, like inline sql, or a complete different approach. 请不要提出不同的解决方案,我们已经在考虑它们,例如内联sql或完全不同的方法。 Standard sql-functions are in many cases no option because of the lack of error raising. 在许多情况下,标准的sql函数没有选项,因为没有错误提升。

PS. PS。 We are using SQL 2008R2. 我们正在使用SQL 2008R2。

by the overhead of creating a new connection or allocating memory for every function-call. 通过创建新连接或为每个函数调用分配内存的开销。 So a solution could be connection / memory pooling. 所以解决方案可能是连接/内存池。

It's not something you have to worry about on C# side. 在C#方面,您不必担心。 You're not allocating memory (of course you're allocating strings and stuff you need inside your function, nothing you can pool/reuse). 你没有分配内存(当然你在你的函数中分配你需要的字符串和东西,没有你可以集中/重用的东西)。 Also connection isn't something you have to worry about. 连接也不是你必须担心的事情。

This works fine except for performance. 除性能外,这种方法很好。

Your code is doing something incredibly...EXCEPTIONALLY...slow : throwing exceptions instead of performing checks. 你的代码正在做一些令人难以置信的事情......特别......慢 :抛出异常而不是执行检查。 An exception is an expansive operation and should be used to handle exceptional situations (just 100/200 records with a null - or invalid - value and it'll slow down a query over 1,000,000 records). 异常是一个扩展操作 ,应用于处理特殊情况(仅100/200个记录具有null或无效值,这会减慢1,000,000条记录的查询速度)。 Wrong input format or null values in a database column...aren't exceptional (this programming style - exceptions instead of checks - is allowed and even encouraged in other languages like Python. I'd in general avoid it in C#. For sure it's not appropriate here where performance is an issue). 错误输入格式或null数据库中的列值......也不例外(这种编程风格-异常而不是检查-是允许的 ,即使是在像Python语言鼓励我一般避免它在C#中确定。这在性能是一个问题的地方是不合适的)。

public static bool check_smallint(string input)
{
    if (String.IsNullOrWhiteSpace(input))
        return true;

    short value;
    return Int16.TryParse(input, out value);
}

Note that: String.IsNullOrWhiteSpace(input) will return true for null inputs or strings made only of spaces (replacing your Trim() and NullReferenceException stuff). 请注意:对于null输入或仅由空格组成的字符串, String.IsNullOrWhiteSpace(input)将返回true (替换Trim()NullReferenceException东西)。 Everything else ( FormatException for input text that is not an integer or a too big number with OverflowException ) is handled by Int16.TryParse() . 其他所有内容(输入文本的FormatException不是整数或具有OverflowException的太大数字)由Int16.TryParse()处理。 Code is shorter (and slightly faster ) for valid inputs but it's many times faster for invalid ones. 有效输入的代码较短( 略快 ),而无效输入的代码则 许多倍

I am making this a separate answer instead of a comment on @Adriano's answer so that it is less likely to be missed (since not everyone reads all of the comments). 我将这个问题作为一个单独的答案而不是对@Adriano的回答进行评论,以便不太可能被错过(因为不是每个人都阅读所有评论)。


In addition to changing the approach as suggested by @Adriano , you should really be using the appropriate datatypes, found in the System.Data.SqlTypes Namespace , for all input/output parameters and return values. 除了按照@Adriano的建议更改方法之外 ,您应该使用System.Data.SqlTypes命名空间中的相应数据类型来获取所有输入/输出参数和返回值。 There are some important differences and benefits to using them, such as them all having an .IsNull property. 使用它们有一些重要的区别和好处,例如它们都具有.IsNull属性。 The full list of differences is too much info to put here, but I did document it in the following article: Stairway to SQLCLR Level 5: Development (Using .NET within SQL Server) 完整的差异列表是放在这里的太多信息,但我在下面的文章中做了记录: SQLCLR的阶梯级别5:开发(在SQL Server中使用.NET)

Adapting @Adriano's code to use the proper types would give you the following: 调整@Adriano的代码以使用正确的类型将为您提供以下内容:

public static SqlBoolean check_smallint(SqlString input)
{
    if (input.IsNull)
        return true;

    if (input.Value.Trim() == String.Empty)
        return true;

    short value;
    return Int16.TryParse(input.Value, out value);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM