[英]Using LISTAGG function in SQL causes error: Result size exceeds LISTAGG limit
I was trying to used the LISTAGG
function in SQL and I facing the following error:我试图在 SQL 中使用
LISTAGG
函数,但LISTAGG
以下错误:
Invalid operation: Result size exceeds LISTAGG limit Details:
无效操作:结果大小超过 LISTAGG 限制详细信息:
----------- error: Result size exceeds LISTAGG limit code: 8 ...----------- 错误:结果大小超过 LISTAGG 限制代码:8 ...
How do I get rid of this error?我如何摆脱这个错误?
Please refer to ListAgg functio documentation at https://docs.aws.amazon.com/redshift/latest/dg/r_LISTAGG.html请参阅https://docs.aws.amazon.com/redshift/latest/dg/r_LISTAGG.html 上的ListAgg 功能文档
The return data type is varchar(max) that is 64K varchar size返回数据类型为 varchar(max),即 64K varchar 大小
The error you described is exactly mentioned in the official documentation您描述的错误在官方文档中完全有提及
You can think of using ListAgg() function with Distinct as follows to reduce the items to be concatenated您可以考虑将 ListAgg() 函数与 Distinct 结合使用,以减少要连接的项目
select listagg(distinct sellerid, ', ') within group (order by sellerid) from sales
where eventid = 4337;
This is the reason for the issue we encountered.这就是我们遇到的问题的原因。
Here is the SQL query I tried to execute,这是我尝试执行的 SQL 查询,
SELECT DISTINCT "year_level", LISTAGG("value", ', ') WITHIN GROUP (ORDER BY "year_level") OVER (PARTITION BY "year_level")
FROM "school_1__acara_db"."acara_data_set";
and this is the error I got.这是我得到的错误。
ERROR: Result size exceeds LISTAGG limit Detail: ----------------------------------------------- error: Result size exceeds LISTAGG limit code: 8001 context: LISTAGG limit: 65535 query: 4360256 location: string_ops.cpp:116 process: query1_127_4360256 [pid=1793] -----------------------------------------------
Let's break this issue into small segments.让我们把这个问题分成几个小部分。 So as they mentioned I've exceeded the maximum limit of
LISTAGG
.所以正如他们提到的,我已经超过了
LISTAGG
的最大限制。 We can find it with the below SQL query which values are exceeded in the "value"
column according to the "year_level"
.我们可以通过下面的 SQL 查询找到根据
"year_level"
在"value"
列中超出的"value"
。
SELECT "year_level", SUM(OCTET_LENGTH("value")) as total_bytes
FROM "school_1__acara_db"."acara_data_set"
GROUP BY "year_level"
ORDER BY total_bytes;
This is the output.这是输出。
OCTET_LENGTH returns the length of the string in bytes(octets). OCTET_LENGTH以字节(八位字节)为单位返回字符串的长度。
As you can see values related to the Primary Ungraded
have total bytes of 50329
and Secondary Ungraded
have 61178
bytes.如您所见,与
Primary Ungraded
相关的值的总字节数为50329
, Secondary Ungraded
总字节数为61178
字节。 Both aren't exceeding the VARCHAR(MAX)
, the 65535
limit.两者都没有超过
VARCHAR(MAX)
,即65535
限制。 At least can I get LISTAGG
values for the above two records?至少我可以获得以上两条记录的
LISTAGG
值吗? This is the query I'm going to execute,这是我要执行的查询,
SELECT DISTINCT "year_level", LISTAGG("value", ', ') WITHIN GROUP (ORDER BY "year_level") OVER (PARTITION BY "year_level")
FROM "school_1__acara_db"."acara_data_set"
WHERE "year_level" IN ('Primary Ungraded', 'Secondary Ungraded');
I got the same error, Result size exceeds LISTAGG limit Detail: -----------
.我遇到了同样的错误,
Result size exceeds LISTAGG limit Detail: -----------
。 As we can see in the above result it didn't exceed the VARCHAR(MAX)
, the 65535
limit.正如我们在上面的结果中看到的那样,它没有超过
VARCHAR(MAX)
,即65535
限制。 But why?但为什么? Let's see the
"value"
column count related to the "year_level"
with the below query.让我们通过以下查询查看与
"year_level"
相关的"value"
列数。
SELECT "year_level", COUNT("value") as total_counts
FROM "school_1__acara_db"."acara_data_set"
GROUP BY "year_level"
ORDER BY total_counts;
This is the output.这是输出。
Before moving to the further explanation, let's see how LISTAGG
works.在进一步解释之前,让我们看看
LISTAGG
是如何工作的。
In Redshift, LISTAGG
can be used as an Aggregate function or a Window function and it transforms data from multiple rows into a single list of values separated by a specified delimiter.在 Redshift 中,
LISTAGG
可用作 聚合函数或窗口函数,它将多行中的数据转换为由指定分隔符分隔的单个值列表。 For the below example, the delimiter is ,
(comma with spaces).对于下面的示例,分隔符是
,
(带空格的逗号)。
The following image is taken from the Oracle's Listagg Function - Uses and Duplicate Removal article and it is related to the Oracle, but can get the basic concepts of the LISTAGG
function.下图取自Oracle的Listagg函数-使用和重复删除文章,它与Oracle相关,但可以获得
LISTAGG
函数的基本概念。
That is how the data is merged with the delimiter.这就是数据与分隔符合并的方式。
Even our query also used the delimiter as ,
(comma with spaces).甚至我们的查询也使用分隔符作为
,
(带空格的逗号)。 Let's get the first record in the below image as an example.我们以下图中的第一条记录为例。
"Primary Ungraded"
has 3412
records consist of 50329
of total bytes. "Primary Ungraded"
有3412
条记录,总字节数为50329
。 So this means we are going to merge 3412
records into a single column.所以这意味着我们要将
3412
条记录合并到一个列中。 When we merge it should have 50329
of total bytes.当我们合并时,它的总字节数应该为
50329
。 But we are not merging directly, we are merging with the delimiter.但是我们不是直接合并,而是与分隔符合并。 So there are
3411
delimiters in-between 3412
records.所以在
3412
条记录之间有3411
分隔符。
delimiter_count = no_of_records - 1
If not understand please check the Example #1
image of how data is merged.如果不明白,请查看
Example #1
图像,了解如何合并数据。 So according to this when we run our last failed query without a delimiter, it should work.因此,根据这一点,当我们在没有分隔符的情况下运行上次失败的查询时,它应该可以工作。
SELECT DISTINCT "year_level", LISTAGG("value", '') WITHIN GROUP (ORDER BY "year_level") OVER (PARTITION BY "year_level")
FROM "school_1__acara_db"."acara_data_set"
WHERE "year_level" IN ('Primary Ungraded', 'Secondary Ungraded')
Yes, it's working fine.是的,它工作正常。 But this won't work for other records, because without doing anything they already exceeded the
VARCHAR(MAX)
, the 65535
limit.但这对其他记录不起作用,因为没有做任何事情,它们已经超过了
VARCHAR(MAX)
,即65535
限制。
A lot of people suggest using the DISTINCT
keyword with the LISTAGG
function.很多人建议在
LISTAGG
函数中使用DISTINCT
关键字。 Both Aggregate function and Window function support DISTINCT
keyword as an optional. Aggregate 函数和Window 函数都支持
DISTINCT
关键字作为可选。
LISTAGG( [DISTINCT] aggregate_expression [, 'delimiter' ] )
[ WITHIN GROUP (ORDER BY order_list) ]
LISTAGG( [DISTINCT] expression [, 'delimiter' ] )
[ WITHIN GROUP (ORDER BY order_list) ]
OVER ( [PARTITION BY partition_expression] )
Without duplication, I have data which are exceeding the VARCHAR(MAX)
, the 65535
limit.没有重复,我有超过
VARCHAR(MAX)
65535
限制的数据。 So I can't use it in my case.所以我不能在我的情况下使用它。
Can't we slice the data into smaller parts?我们不能把数据切成更小的部分吗? Yes, we can do it with the below query, don't be confused and the below solution was found by one of my teammates' name Isuru.
是的,我们可以使用下面的查询来完成,不要混淆,下面的解决方案是由我的一位队友的名字 Isuru 找到的。
SELECT year_level, num_of_parts, listagg(value,',') AS listagg_data FROM (
SELECT year_level, value, total_bytes / 60000 AS num_of_parts FROM (
SELECT year_level, value, SUM(OCTET_LENGTH(value)) OVER (PARTION BY year_level ORDER BY value ROWS UNBOUNDED PRECEDING) AS total_bytes
FROM "school_1__acara_db"."acara_data_set"
)
)
GROUP BY year_level, num_of_parts
ORDER BY year_level, num_of_parts;
This is the output.这是输出。
Using this you can grab all the information you want.使用它,您可以获取所需的所有信息。 Here we are slicing the
total_bytes
by 60000
and can see how many pieces were broken in the num_of_parts
column.在这里,我们将
total_bytes
切片60000
,可以看到num_of_parts
列中有多少碎片被破坏。 'Primary Ungraded'
didn't slice into any part and 'Secondary Ungraded' has been sliced into 2 parts as we investigated earlier likewise it has been sliced into multiple parts by 60000
. 'Primary Ungraded'
没有分成任何部分,而 'Secondary Ungraded' 已经被分成 2 部分,正如我们之前调查的那样,同样它被60000
分成了多个部分。
The issue we encountered is a database limitation.我们遇到的问题是数据库限制。 So programmatically we can merge the
LISTAGG
values into a single place.因此,我们可以通过编程将
LISTAGG
值合并到一个地方。 At the moment I don't see any other solution or couldn't find any proper solution throughout the internet.目前我没有看到任何其他解决方案,或者在整个互联网上找不到任何合适的解决方案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.