Redshift 结果大小超过 svl_statementtext 的 LISTAGG 限制

Question

Trying to reconstruct my query history from svl_statementtext using listagg.尝试使用 listagg 从 svl_statementtext 重建我的查询历史记录。
Getting error :得到错误：

Result size exceeds LISTAGG limit (limit: 65535)结果大小超过 LISTAGG 限制（限制：65535）

However, I cannot see how or where I have exceeded limit.但是，我看不到我超出限制的方式或位置。

My failing query :我失败的查询：

SELECT pid,xid, min(starttime) AS starttime, 
  pg_catalog.listagg(
        CASE WHEN (len(rtrim(("text")::text)) = 0) THEN ("text")::text ELSE rtrim(("text")::text) END
        , ''::text
    ) WITHIN GROUP(ORDER BY "sequence") 
    AS query_statement 
FROM svl_statementtext
GROUP BY pid,xid
HAVING  min(starttime) >= '2022-06-27 10:00:00';

After the fail, I checked to see if I could find where the excessive size was coming from :失败后，我检查是否可以找到过大尺寸的来源：

SELECT pid,xid, min(starttime) AS starttime, 
  SUM(OCTET_LENGTH(
      CASE WHEN (len(rtrim(("text")::text)) = 0) THEN ("text")::text ELSE rtrim(("text")::text) END
  )) as total_bytes
FROM svl_statementtext
GROUP BY pid,xid
HAVING  min(starttime) >= '2022-06-27 10:00:00'
ORDER BY total_bytes desc;

However the largest size that this query reports is 2962 So how/why is listagg complaining about 65535 ??然而，这个查询报告的最大尺寸是 2962 那么 listagg 如何/为什么抱怨 65535 呢？

Have seen some other posts mentioning using listaggdistinct, and catering for when the value being aggregated is null, but none seem to change my problem.已经看到其他一些帖子提到使用 listaggdistinct，并在聚合的值为 null 时提供服务，但似乎没有一个能改变我的问题。

Any guidance appreciated :)任何指导表示赞赏:)

Answer 1

The longest string that Redshift can hold is 64K bytes. Redshift 可以保存的最长字符串是 64K 字节。 Listagg() is likely generating a string longer than this. Listagg() 可能会生成一个比这更长的字符串。 The "text" column in svl_statementtext is 200 characters so if you have more than 319 segments you can overflow this string size. svl_statementtext 中的“文本”列是 200 个字符，因此如果您有超过 319 个段，您可能会溢出这个字符串大小。

The other issue I see is that your query will combine multiple statements into one string.我看到的另一个问题是您的查询会将多个语句组合成一个字符串。 You are only grouping by xid and pid which will give you all statements for a transaction.您仅按 xid 和 pid 分组，这将为您提供事务的所有语句。 Add starttime to your group by list and this will break different statements into different results.按列表将 starttime 添加到您的组中，这会将不同的语句分解为不同的结果。

Also remember that xid and pid values repeat every few days so have some date range limit can help prevent a lot of confusion.还要记住，xid 和 pid 值每隔几天就会重复一次，所以有一些日期范围限制可以帮助防止很多混乱。

You need to add您需要添加

where sequence < 320

to your query and also group by starttime.到您的查询并按开始时间分组。

Here's a query I have used to put together statements in Redshift:这是我用来将 Redshift 中的语句放在一起的查询：

select xid, pid, starttime, max(datediff('sec',starttime,endtime)) as runtime, type, listagg(regexp_replace(text,'\\\\n*',' ')) WITHIN GROUP (ORDER BY sequence) || ';' as querytext
from svl_statementtext
where pid = (SELECT pg_backend_pid()) --current session
    and sequence < 320
    and starttime > getdate() - interval '24 hours'
group by starttime, 1, 2, "type" order by starttime, 1 asc, "type" desc ;

Redshift 结果大小超过 svl_statementtext 的 LISTAGG 限制

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-06-27 16:46:53

Redshift 结果大小超过 svl_statementtext 的 LISTAGG 限制

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-06-27 16:46:53

解决方案1
2 已采纳 2022-06-27 16:46:53