Using LISTAGG function in SQL causes error: Result size exceeds LISTAGG limit

Question

I was trying to used the LISTAGG function in SQL and I facing the following error:

Invalid operation: Result size exceeds LISTAGG limit Details:
----------- error: Result size exceeds LISTAGG limit code: 8 ...

How do I get rid of this error?

Answer 1

Please refer to ListAgg functio documentation at https://docs.aws.amazon.com/redshift/latest/dg/r_LISTAGG.html

The return data type is varchar(max) that is 64K varchar size

The error you described is exactly mentioned in the official documentation

You can think of using ListAgg() function with Distinct as follows to reduce the items to be concatenated

select listagg(distinct sellerid, ', ') within group (order by sellerid) from sales
where eventid = 4337;

Answer 2

This is the reason for the issue we encountered.

Here is the SQL query I tried to execute,

SELECT DISTINCT "year_level", LISTAGG("value", ', ')  WITHIN GROUP (ORDER BY "year_level") OVER (PARTITION BY "year_level")
FROM "school_1__acara_db"."acara_data_set";

and this is the error I got.

ERROR: Result size exceeds LISTAGG limit Detail: ----------------------------------------------- error: Result size exceeds LISTAGG limit code: 8001 context: LISTAGG limit: 65535 query: 4360256 location: string_ops.cpp:116 process: query1_127_4360256 [pid=1793] -----------------------------------------------

Let's break this issue into small segments. So as they mentioned I've exceeded the maximum limit of LISTAGG . We can find it with the below SQL query which values are exceeded in the "value" column according to the "year_level" .

SELECT "year_level", SUM(OCTET_LENGTH("value")) as total_bytes
FROM "school_1__acara_db"."acara_data_set"
GROUP BY "year_level"
ORDER BY total_bytes;

This is the output.

OCTET_LENGTH returns the length of the string in bytes(octets).

As you can see values related to the Primary Ungraded have total bytes of 50329 and Secondary Ungraded have 61178 bytes. Both aren't exceeding the VARCHAR(MAX) , the 65535 limit. At least can I get LISTAGG values for the above two records? This is the query I'm going to execute,

SELECT DISTINCT "year_level", LISTAGG("value", ', ')  WITHIN GROUP (ORDER BY "year_level") OVER (PARTITION BY "year_level")
FROM "school_1__acara_db"."acara_data_set"
WHERE "year_level" IN ('Primary Ungraded', 'Secondary Ungraded');

I got the same error, Result size exceeds LISTAGG limit Detail: ----------- . As we can see in the above result it didn't exceed the VARCHAR(MAX) , the 65535 limit. But why? Let's see the "value" column count related to the "year_level" with the below query.

SELECT "year_level", COUNT("value") as total_counts
FROM "school_1__acara_db"."acara_data_set"
GROUP BY "year_level"
ORDER BY total_counts;

This is the output.

Before moving to the further explanation, let's see how LISTAGG works.

In Redshift, LISTAGG can be used as an Aggregate function or a Window function and it transforms data from multiple rows into a single list of values separated by a specified delimiter. For the below example, the delimiter is , (comma with spaces).

The following image is taken from the Oracle's Listagg Function - Uses and Duplicate Removal article and it is related to the Oracle, but can get the basic concepts of the LISTAGG function.

That is how the data is merged with the delimiter.

Even our query also used the delimiter as , (comma with spaces). Let's get the first record in the below image as an example.

"Primary Ungraded" has 3412 records consist of 50329 of total bytes. So this means we are going to merge 3412 records into a single column. When we merge it should have 50329 of total bytes. But we are not merging directly, we are merging with the delimiter. So there are 3411 delimiters in-between 3412 records.

delimiter_count = no_of_records - 1

If not understand please check the Example #1 image of how data is merged. So according to this when we run our last failed query without a delimiter, it should work.

SELECT DISTINCT "year_level", LISTAGG("value", '')  WITHIN GROUP (ORDER BY "year_level") OVER (PARTITION BY "year_level")
FROM "school_1__acara_db"."acara_data_set"
WHERE "year_level" IN ('Primary Ungraded', 'Secondary Ungraded')

Yes, it's working fine. But this won't work for other records, because without doing anything they already exceeded the VARCHAR(MAX) , the 65535 limit.

A lot of people suggest using the DISTINCT keyword with the LISTAGG function. Both Aggregate function and Window function support DISTINCT keyword as an optional.

Aggregate function

LISTAGG( [DISTINCT] aggregate_expression [, 'delimiter' ] ) 
[ WITHIN GROUP (ORDER BY order_list) ]

Window function

LISTAGG( [DISTINCT] expression [, 'delimiter' ] ) 
[ WITHIN GROUP (ORDER BY order_list) ] 
OVER ( [PARTITION BY partition_expression] )

Without duplication, I have data which are exceeding the VARCHAR(MAX) , the 65535 limit. So I can't use it in my case.

Can't we slice the data into smaller parts? Yes, we can do it with the below query, don't be confused and the below solution was found by one of my teammates' name Isuru.

SELECT year_level, num_of_parts, listagg(value,',') AS listagg_data FROM (
    SELECT year_level, value, total_bytes / 60000 AS num_of_parts FROM (
        SELECT year_level, value, SUM(OCTET_LENGTH(value)) OVER (PARTION BY year_level ORDER BY value ROWS UNBOUNDED PRECEDING) AS total_bytes
            FROM "school_1__acara_db"."acara_data_set"
        )
)
GROUP BY year_level, num_of_parts
ORDER BY year_level, num_of_parts;

This is the output.

Using this you can grab all the information you want. Here we are slicing the total_bytes by 60000 and can see how many pieces were broken in the num_of_parts column. 'Primary Ungraded' didn't slice into any part and 'Secondary Ungraded' has been sliced into 2 parts as we investigated earlier likewise it has been sliced into multiple parts by 60000 .

The issue we encountered is a database limitation. So programmatically we can merge the LISTAGG values into a single place. At the moment I don't see any other solution or couldn't find any proper solution throughout the internet.

Using LISTAGG function in SQL causes error: Result size exceeds LISTAGG limit

Question

2 answers

solution1
3 2020-10-30 16:46:54

solution2
0 2021-08-18 11:59:42

Using LISTAGG function in SQL causes error: Result size exceeds LISTAGG limit

Question

2 answers

solution1 3 2020-10-30 16:46:54

solution2 0 2021-08-18 11:59:42

solution1
3 2020-10-30 16:46:54

solution2
0 2021-08-18 11:59:42