简体   繁体   中英

How to get aggregate row count for sharded tables in citus postgres

Requirement is to get sum of reltuples for all distributed tables in Citus Postgres database with a certain format which is explained below.

When i run the below command, the query propagates to workers and depending on the shard counts on each worker, results are fetched for each table:

SELECT result from run_command_on_workers( $cmd$ select json_agg(json_build_object(reltuples, relname)) from pg_class c join pg_catalog.pg_namespace n on n.oid=c.relnamespace where n.nspname not in ('citus', 'pg_toast', 'pg_catalog')

For ex, the results consists of <row_count, table_name_and_shardid> {10, table_A_shardid0}, {20,table_A_shardid1}, {15, table_B_shardid0} Is it possible to combine counts from table_A_shardid0 and table_A_shardid1 so that the result is {30, table_A}

You can use run_command_on_shards to query on all shards of a table. This way you can get all reltuple values from shards and combine them later in the coordinator to get your desired format. You can learn more about the command here at the Citus Official Documentation

The catalog table pg_dist_partition stores metadata about which tables in the database are distributed.

Using what I have explained above, the query can look something like this:

WITH shards_reltuples AS (
  SELECT logicalrelid,
  (run_command_on_shards(logicalrelid,
     $$ SELECT reltuples FROM pg_class WHERE oid = '%s'::regclass::oid $$ 
  )).result::real AS reltuples
  FROM pg_dist_partition)

SELECT logicalrelid, sum(reltuples) AS reltuples
FROM shards_reltuples GROUP BY logicalrelid;

You can elaborate on top of it to get your desired json format.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM