简体   繁体   English

STRING_AGG 的行为不符合预期

[英]STRING_AGG not behaving as expected

I have the following query:我有以下查询:

WITH cteCountryLanguageMapping AS (
    SELECT * FROM (
        VALUES
            ('Spain', 'English'),
            ('Spain', 'Spanish'),
            ('Sweden', 'English'),
            ('Switzerland', 'English'),
            ('Switzerland', 'French'),
            ('Switzerland', 'German'),
            ('Switzerland', 'Italian')
    ) x ([Country], [Language])
)
SELECT
    [Country],
    CASE COUNT([Language])
        WHEN 1 THEN MAX([Language])
        WHEN 2 THEN STRING_AGG([Language], ' and ')
        ELSE STRING_AGG([Language], ', ')
    END AS [Languages],
    COUNT([Language]) AS [LanguageCount]
FROM cteCountryLanguageMapping
GROUP BY [Country]

I was expecting the value inside Languages column for Switzerland to be comma separated ie:我希望瑞士的 Languages 列中的值以逗号分隔,即:

  | Country     | Languages                                 | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain       | Spanish and English                       | 2
2 | Sweden      | English                                   | 1
3 | Switzerland | French, German, Italian, English          | 4

Instead I am getting the below output (the 4 values are separated by and ):相反,我得到了以下输出(4 个值由and分隔):

  | Country     | Languages                                 | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain       | Spanish and English                       | 2
2 | Sweden      | English                                   | 1
3 | Switzerland | French and German and Italian and English | 4

What am I missing?我错过了什么?


Here is another example:这是另一个例子:

SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG(z, '-') AS STRING_AGG_MINUS
FROM (
    VALUES
        (1, 'a'),
        (1, 'b')
) x (y, z)
GROUP by y

  | y | STRING_AGG_PLUS | STRING_AGG_MINUS
--+---+-----------------+-----------------
1 | 1 | a+b             | a+b

Is this a bug in SQL Server?这是 SQL Server 中的错误吗?

Yes, this is a Bug (tm), present in all versions of SQL Server 2017 (as of writing).是的,这是一个错误 (tm),存在于所有版本的 SQL Server 2017(截至撰写时)。 It's fixed in Azure SQL Server and 2019 RC1.它已在 Azure SQL Server 和 2019 RC1 中修复。 Specifically, the part in the optimizer that performs common subexpression elimination (ensuring that we don't calculate expressions more than necessary) improperly considers all expressions of the form STRING_AGG(x, <separator>) identical as long as x matches, no matter what <separator> is, and unifies these with the first calculated expression in the query.具体来说,优化器中执行公共子表达式消除的部分(确保我们不计算不必要的表达式)错误地认为所有形式为STRING_AGG(x, <separator>)表达式只要x匹配,无论如何都相同<separator>是,并将它们与查询中的第一个计算表达式统一起来。

One workaround is to make sure x does not match by performing some sort of (near-)identity transformation on it.一种解决方法是通过对其执行某种(近)身份转换来确保x不匹配。 Since we're dealing with strings, concatenating an empty one will do:由于我们正在处理字符串,因此连接一个空字符串将执行以下操作:

SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG('' + z, '-') AS STRING_AGG_MINUS
FROM (
    VALUES
        (1, 'a'),
        (1, 'b')
) x (y, z)
GROUP by y

Don't repeat yourself*. 不要重复自己*。 You are repeating yourself by using MAX(...) , LIST_AGG(...', ') and LIST_AGG(...' and ') . 您正在通过使用MAX(...)LIST_AGG(...', ')LIST_AGG(...' and ')来重复自己。 You could simply rewrite your query like this and might end up with a better plan: 您可以像这样简单地重写查询,并可能得到更好的计划:

WITH cteCountryLanguageMapping AS (
    SELECT * FROM (
        VALUES
            ('Spain', 'English'),
            ('Spain', 'Spanish'),
            ('Sweden', 'English'),
            ('Switzerland', 'English'),
            ('Switzerland', 'French'),
            ('Switzerland', 'German'),
            ('Switzerland', 'Italian')
    ) x (Country, Language)
), results AS (
    SELECT
        Country,
        COUNT(Language) AS LanguageCount,
        STRING_AGG(Language, ', ') AS Languages
    FROM cteCountryLanguageMapping
    GROUP BY Country
)
SELECT Country, LanguageCount, CASE LanguageCount
    WHEN 2 THEN REPLACE(Languages, ', ', ' and ')
    ELSE Languages
END AS Languages_Fixed
FROM results

Result: 结果:

| Country     | LanguageCount | Languages_Fixed                  |
|-------------|---------------|----------------------------------|
| Spain       | 2             | Spanish and English              |
| Sweden      | 1             | English                          |
| Switzerland | 4             | French, German, Italian, English |

DB Fiddle DB小提琴

* I did not want to repeat others as well by saying that it is a bug. *我也不想通过说这是一个错误来重复别人。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM