简体   繁体   English

如何获取athena aws中所有表的记录数

[英]How to get the record count of all tables in athena aws

I am looking for a way to find the record count of all the tables (in all table schemas) in my aws Athena.我正在寻找一种方法来查找我的 aws Athena 中所有表(在所有表模式中)的记录数。 I have tried with following, but it looks like information schema doesn't provide the record count.我试过以下,但看起来信息模式不提供记录数。 can someone help.有人可以帮忙吗


SELECT t.table_schema, t.table_name, t.table_rows
FROM   "information_schema"."schemata" s
INNER JOIN "information_schema"."tables" t on s.schema_name = t.table_schema
INNER JOIN "information_schema"."columns" c on c.table_name = t.table_name AND c.table_schema = t.table_schema
WHERE c.table_catalog = 'awsdatacatalog'

but it looks like information schema doesn't provide the record count但看起来信息模式不提供记录数

I would argue for pretty obvious reasons, first of all it is not part of schema information, secondary - from pragmatic performance reasons - to provide record counts Athena/Presto/Trino will need to process all data files/sources.我会出于非常明显的原因争论,首先它不是模式信息的一部分,其次 - 从务实的性能原因 - 提供记录计数 Athena/Presto/Trino 将需要处理所有数据文件/源。

AFAIK Presto/Trino does not support any kind of procedural query execution (like PL/SQL combined with something allowing to execute SQL from string) so the only option is to build the query via SQL or some other language and execute it. AFAIK Presto/Trino 不支持任何类型的过程查询执行(比如 PL/SQL 结合允许从字符串执行 SQL 的东西)所以唯一的选择是通过 SQL 或其他语言构建查询并执行它。 Something to start with:开始的事情:

with tables(full_name) as(
    SELECT '"' || t.table_schema || '"."' || t.table_name || '"' as full_name
    FROM "information_schema"."tables" t
)

select array_join(array_agg('select ''' || full_name || ''' as table_name, count(*) as rows_count from ' || full_name), ' union all ')
from tables
group by true;

Alternatively you can define custom Athena function via lambda which will dynamically build and execute corresponding sql statement.或者,您可以通过 lambda 定义自定义 Athena 函数,它将动态构建和执行相应的 sql 语句。

You can do this as a two step process.您可以分两步执行此操作。 1. Dynamically building the SQL for getting the counts using the below query. 1. 使用以下查询动态构建用于获取计数的 SQL。 2. Running the output of the SQL to generate the counts 2. 运行 SQL 的输出以生成计数

with tname_vw(i) as (
    SELECT concat(
            'select ''',
            table_name,
            ''' as table_name,  count(*) from ',
            table_name
        )
    FROM information_schema.tables
    WHERE table_schema = 'schema_name'
)
select array_join(array_agg(i), ' union ') as result
from tname_vw

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM