We are streaming realtime data to Redshift. The bottleneck is no of table loads that can run concurrently. We at present are running more than 1000+ loads every 15mins.
But we want to reduce this number based on how frequently these tables are used by the users. Please suggest how can we get this information in Redshift.
This view open sourced by awslabs can be used to query the most frequently queried tables.
CREATE OR REPLACE VIEW admin.v_get_table_scan_frequency
AS
SELECT
database,
schema AS schemaname,
table_id,
"table" AS tablename,
size,
sortkey1,
NVL(s.num_qs,0) num_qs
FROM svv_table_info t
LEFT JOIN (SELECT
tbl, perm_table_name,
COUNT(DISTINCT query) num_qs
FROM
stl_scan s
WHERE
s.userid > 1
AND s.perm_table_name NOT IN ('Internal Worktable','S3')
GROUP BY
tbl, perm_table_name) s ON s.tbl = t.table_id
AND t."schema" NOT IN ('pg_internal')
ORDER BY 7 desc;
\d admin.v_get_table_scan_frequency
Column | Type | Modifiers
------------+--------+-----------
database | text |
schemaname | text |
table_id | oid |
tablename | text |
size | bigint |
sortkey1 | text |
num_qs | bigint |
select * from admin.v_get_table_scan_frequency order by num_qs;
database | schemaname | table_id | tablename | size | sortkey1 | num_qs
-----------------+------------+----------+------------------------------------------+-------+---------------+--------
db | product | 1 | table1 | 92 | AUTO(SORTKEY) | 13448
db | product | 2 | table2 | 180 | AUTO(SORTKEY) | 13389
Keeping a time series data of this query in Prometheus can help find rate and frequency trend over time for each table. Based on that we can decided how frequently to refresh data in Redshift.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.