简体   繁体   English

在 SQL 中使用 LISTAGG 函数会导致错误:结果大小超出 LISTAGG 限制

[英]Using LISTAGG function in SQL causes error: Result size exceeds LISTAGG limit

I was trying to used the LISTAGG function in SQL and I facing the following error:我试图在 SQL 中使用LISTAGG函数,但LISTAGG以下错误:

Invalid operation: Result size exceeds LISTAGG limit Details:无效操作:结果大小超过 LISTAGG 限制详细信息:
----------- error: Result size exceeds LISTAGG limit code: 8 ... ----------- 错误:结果大小超过 LISTAGG 限制代码:8 ...

How do I get rid of this error?我如何摆脱这个错误?

Please refer to ListAgg functio documentation at https://docs.aws.amazon.com/redshift/latest/dg/r_LISTAGG.html请参阅https://docs.aws.amazon.com/redshift/latest/dg/r_LISTAGG.html 上的ListAgg 功能文档

The return data type is varchar(max) that is 64K varchar size返回数据类型为 varchar(max),即 64K varchar 大小

The error you described is exactly mentioned in the official documentation您描述的错误在官方文档中完全有提及

You can think of using ListAgg() function with Distinct as follows to reduce the items to be concatenated您可以考虑将 ListAgg() 函数与 Distinct 结合使用,以减少要连接的项目

select listagg(distinct sellerid, ', ') within group (order by sellerid) from sales
where eventid = 4337;

This is the reason for the issue we encountered.这就是我们遇到的问题的原因。

在此处输入图片说明


Here is the SQL query I tried to execute,这是我尝试执行的 SQL 查询,

SELECT DISTINCT "year_level", LISTAGG("value", ', ')  WITHIN GROUP (ORDER BY "year_level") OVER (PARTITION BY "year_level")
FROM "school_1__acara_db"."acara_data_set";

and this is the error I got.这是我得到的错误。

ERROR: Result size exceeds LISTAGG limit Detail: ----------------------------------------------- error: Result size exceeds LISTAGG limit code: 8001 context: LISTAGG limit: 65535 query: 4360256 location: string_ops.cpp:116 process: query1_127_4360256 [pid=1793] -----------------------------------------------

在此处输入图片说明


Let's break this issue into small segments.让我们把这个问题分成几个小部分。 So as they mentioned I've exceeded the maximum limit of LISTAGG .所以正如他们提到的,我已经超过了LISTAGG的最大限制。 We can find it with the below SQL query which values are exceeded in the "value" column according to the "year_level" .我们可以通过下面的 SQL 查询找到根据"year_level""value"列中超出的"value"

SELECT "year_level", SUM(OCTET_LENGTH("value")) as total_bytes
FROM "school_1__acara_db"."acara_data_set"
GROUP BY "year_level"
ORDER BY total_bytes;

This is the output.这是输出。

在此处输入图片说明

OCTET_LENGTH returns the length of the string in bytes(octets). OCTET_LENGTH以字节(八位字节)为单位返回字符串的长度。

As you can see values related to the Primary Ungraded have total bytes of 50329 and Secondary Ungraded have 61178 bytes.如您所见,与Primary Ungraded相关的值的总字节数为50329Secondary Ungraded总字节数为61178字节。 Both aren't exceeding the VARCHAR(MAX) , the 65535 limit.两者都没有超过VARCHAR(MAX) ,即65535限制。 At least can I get LISTAGG values for the above two records?至少我可以获得以上两条记录的LISTAGG值吗? This is the query I'm going to execute,这是我要执行的查询,

SELECT DISTINCT "year_level", LISTAGG("value", ', ')  WITHIN GROUP (ORDER BY "year_level") OVER (PARTITION BY "year_level")
FROM "school_1__acara_db"."acara_data_set"
WHERE "year_level" IN ('Primary Ungraded', 'Secondary Ungraded');

I got the same error, Result size exceeds LISTAGG limit Detail: ----------- .我遇到了同样的错误, Result size exceeds LISTAGG limit Detail: ----------- As we can see in the above result it didn't exceed the VARCHAR(MAX) , the 65535 limit.正如我们在上面的结果中看到的那样,它没有超过VARCHAR(MAX) ,即65535限制。 But why?但为什么? Let's see the "value" column count related to the "year_level" with the below query.让我们通过以下查询查看与"year_level"相关的"value"列数。

SELECT "year_level", COUNT("value") as total_counts
FROM "school_1__acara_db"."acara_data_set"
GROUP BY "year_level"
ORDER BY total_counts;

This is the output.这是输出。

在此处输入图片说明

Before moving to the further explanation, let's see how LISTAGG works.在进一步解释之前,让我们看看LISTAGG是如何工作的。


In Redshift, LISTAGG can be used as an Aggregate function or a Window function and it transforms data from multiple rows into a single list of values separated by a specified delimiter.在 Redshift 中, LISTAGG可用作 聚合函数窗口函数,它将多行中的数据转换为由指定分隔符分隔的单个值列表。 For the below example, the delimiter is , (comma with spaces).对于下面的示例,分隔符是, (带空格的逗号)。

The following image is taken from the Oracle's Listagg Function - Uses and Duplicate Removal article and it is related to the Oracle, but can get the basic concepts of the LISTAGG function.下图取自Oracle的Listagg函数-使用和重复删除文章,它与Oracle相关,但可以获得LISTAGG函数的基本概念。

在此处输入图片说明

That is how the data is merged with the delimiter.这就是数据与分隔符合并的方式。


Even our query also used the delimiter as , (comma with spaces).甚至我们的查询也使用分隔符作为, (带空格的逗号)。 Let's get the first record in the below image as an example.我们以下图中的第一条记录为例。

在此处输入图片说明

"Primary Ungraded" has 3412 records consist of 50329 of total bytes. "Primary Ungraded"3412条记录,总字节数为50329 So this means we are going to merge 3412 records into a single column.所以这意味着我们要将3412条记录合并到一个列中。 When we merge it should have 50329 of total bytes.当我们合并时,它的总字节数应该为50329 But we are not merging directly, we are merging with the delimiter.但是我们不是直接合并,而是与分隔符合并。 So there are 3411 delimiters in-between 3412 records.所以在3412条记录之间有3411分隔符。

delimiter_count = no_of_records - 1

If not understand please check the Example #1 image of how data is merged.如果不明白,请查看Example #1图像,了解如何合并数据。 So according to this when we run our last failed query without a delimiter, it should work.因此,根据这一点,当我们在没有分隔符的情况下运行上次失败的查询时,它应该可以工作。

SELECT DISTINCT "year_level", LISTAGG("value", '')  WITHIN GROUP (ORDER BY "year_level") OVER (PARTITION BY "year_level")
FROM "school_1__acara_db"."acara_data_set"
WHERE "year_level" IN ('Primary Ungraded', 'Secondary Ungraded')

在此处输入图片说明

Yes, it's working fine.是的,它工作正常。 But this won't work for other records, because without doing anything they already exceeded the VARCHAR(MAX) , the 65535 limit.但这对其他记录不起作用,因为没有做任何事情,它们已经超过了VARCHAR(MAX) ,即65535限制。


A lot of people suggest using the DISTINCT keyword with the LISTAGG function.很多人建议在LISTAGG函数中使用DISTINCT关键字。 Both Aggregate function and Window function support DISTINCT keyword as an optional. Aggregate 函数Window 函数都支持DISTINCT关键字作为可选。

Aggregate function 聚合函数

LISTAGG( [DISTINCT] aggregate_expression [, 'delimiter' ] ) 
[ WITHIN GROUP (ORDER BY order_list) ]   

在此处输入图片说明

Window function 窗函数

LISTAGG( [DISTINCT] expression [, 'delimiter' ] ) 
[ WITHIN GROUP (ORDER BY order_list) ] 
OVER ( [PARTITION BY partition_expression] )     

在此处输入图片说明

Without duplication, I have data which are exceeding the VARCHAR(MAX) , the 65535 limit.没有重复,我有超过VARCHAR(MAX) 65535限制的数据。 So I can't use it in my case.所以我不能在我的情况下使用它。


Can't we slice the data into smaller parts?我们不能把数据切成更小的部分吗? Yes, we can do it with the below query, don't be confused and the below solution was found by one of my teammates' name Isuru.是的,我们可以使用下面的查询来完成,不要混淆,下面的解决方案是由我的一位队友的名字 Isuru 找到的。

SELECT year_level, num_of_parts, listagg(value,',') AS listagg_data FROM (
    SELECT year_level, value, total_bytes / 60000 AS num_of_parts FROM (
        SELECT year_level, value, SUM(OCTET_LENGTH(value)) OVER (PARTION BY year_level ORDER BY value ROWS UNBOUNDED PRECEDING) AS total_bytes
            FROM "school_1__acara_db"."acara_data_set"
        )
)
GROUP BY year_level, num_of_parts
ORDER BY year_level, num_of_parts;

This is the output.这是输出。

在此处输入图片说明

Using this you can grab all the information you want.使用它,您可以获取所需的所有信息。 Here we are slicing the total_bytes by 60000 and can see how many pieces were broken in the num_of_parts column.在这里,我们将total_bytes切片60000 ,可以看到num_of_parts列中有多少碎片被破坏。 'Primary Ungraded' didn't slice into any part and 'Secondary Ungraded' has been sliced into 2 parts as we investigated earlier likewise it has been sliced into multiple parts by 60000 . 'Primary Ungraded'没有分成任何部分,而 'Secondary Ungraded' 已经被分成 2 部分,正如我们之前调查的那样,同样它被60000分成了多个部分。

The issue we encountered is a database limitation.我们遇到的问题是数据库限制。 So programmatically we can merge the LISTAGG values into a single place.因此,我们可以通过编程将LISTAGG值合并到一个地方。 At the moment I don't see any other solution or couldn't find any proper solution throughout the internet.目前我没有看到任何其他解决方案,或者在整个互联网上找不到任何合适的解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM