简体   繁体   English

如何在 Redshift 中找到访问次数最多的表?

[英]How to find the most accessed table in Redshift?

We are streaming realtime data to Redshift.我们正在将实时数据流式传输到 Redshift。 The bottleneck is no of table loads that can run concurrently.瓶颈是没有可以同时运行的表加载。 We at present are running more than 1000+ loads every 15mins.我们目前每 15 分钟运行超过 1000 次负载。

But we want to reduce this number based on how frequently these tables are used by the users.但是我们希望根据用户使用这些表的频率来减少这个数字。 Please suggest how can we get this information in Redshift.请建议我们如何在 Redshift 中获取此信息。

This view open sourced by awslabs can be used to query the most frequently queried tables. awslabs开源的这个视图可以用来查询最常查询的表。

Create view创建视图

CREATE OR REPLACE VIEW admin.v_get_table_scan_frequency
AS
SELECT 
    database, 
    schema AS schemaname, 
    table_id, 
    "table" AS tablename, 
    size, 
    sortkey1, 
    NVL(s.num_qs,0) num_qs
FROM svv_table_info t
LEFT JOIN (SELECT
   tbl, perm_table_name,
   COUNT(DISTINCT query) num_qs
FROM
   stl_scan s
WHERE 
   s.userid > 1
   AND s.perm_table_name NOT IN ('Internal Worktable','S3')
GROUP BY 
   tbl, perm_table_name) s ON s.tbl = t.table_id
AND t."schema" NOT IN ('pg_internal')
ORDER BY 7 desc;

Table桌子

\d admin.v_get_table_scan_frequency
   Column   |  Type  | Modifiers
------------+--------+-----------
 database   | text   |
 schemaname | text   |
 table_id   | oid    |
 tablename  | text   |
 size       | bigint |
 sortkey1   | text   |
 num_qs     | bigint |

Query询问

select * from admin.v_get_table_scan_frequency order by num_qs;

Result结果

database | schemaname | table_id | tablename | size  | sortkey1      | num_qs
-----------------+------------+----------+------------------------------------------+-------+---------------+--------
 db      | product    | 1        | table1    |    92 | AUTO(SORTKEY) |  13448
 db      | product    | 2        | table2    |   180 | AUTO(SORTKEY) |  13389

Keeping a time series data of this query in Prometheus can help find rate and frequency trend over time for each table.在 Prometheus 中保留此查询的时间序列数据可以帮助查找每个表随时间变化的速率和频率趋势。 Based on that we can decided how frequently to refresh data in Redshift.基于此,我们可以决定在 Redshift 中刷新数据的频率。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM