简体繁体中英

Citus data - How to query data from a single shard in a query?

原文 2022-09-21 15:58:54 7 1 postgresql/ citus

We are evaluating Citus data for the large-scale data use cases in our organization. While analyzing, I am trying to see if there is a way to achieve the following with Citus data:

We want to create a distributed table (customers) with customer_id being the shard/distribution key (customer_id is a UUID generated at the application end)
While we can use regular SQL queries for all the CRUD operations on these entities, we also have a need to query the table periodically (periodic task) to select multiple entries based on some filter criteria to fetch the result set to application and update a few columns and write back (Read and update operation).
Our application is a horizontally scalable microservice with multiple instances of the service running in parallel
So we want to split the periodic task (into multiple sub-tasks) to run on multiple instances of the service to execute this parallelly

So I am looking for a way to query results from a specific shard from the sub-task so that each sub-task is responsible to fetch and update the data on one shard only. This will let us run the periodic task parallelly without worrying about conflicts as each subtask is operating on one shard.

I am not able to find anything from the documentation on how we can achieve this. Is this possible with Citus data?

1 answers

Citus (by default) distributes data accross the shards using the hash value of the distribution column, which is customer_id in your case.

To achieve this, you might need to store a (customer_id - shard_id) mapping in your application, and assign subtasks to shards, and send queries from sub-tasks by using this mapping.

One hacky solution that you might consider: You can add a dummy column (I will name it shard_id) and make it the distribution column. So that your application knows which rows should be fetched/updated from which sub-task. In other words, each sub-task will fetch/update the rows with a particular value of (shard_id) column, and all of those rows will be located on the same shard, because they have the same distribution column. In this case, you can manipulate which customer_ids will be on the same shard, and which ones should form a separate shard; by assigning them the shard_id you want.

Also I would suggest you to take a look at "tenant isolation", which is mentioned in the latest blog post: https://www.citusdata.com/blog/2022/09/19/citus-11-1-shards-postgres-tables-without-interruption/#isolate-tenant It basically isolates a tenant (all data with the same customer_id in your case) into a single shard. Maybe it works for you at some point.

How to shard from existing data in a table in Postgresql

Single query to collect data from multiple tables

Find data from multiple tables in a single query

How to select following data in one single query

Query on how to select data based on single condition

How to query data from `PostgresSql`?

How to JOIN data from query with list of data?

Query against worker nodes omitting the coordinator in Citus

Get Statistical data from table in PostgreSQL using single query

How to select data in postgres with count and limit in a single query

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to shard from existing data in a table in Postgresql Single query to collect data from multiple tables Find data from multiple tables in a single query How to select following data in one single query Query on how to select data based on single condition How to query data from `PostgresSql`? How to JOIN data from query with list of data? Query against worker nodes omitting the coordinator in Citus Get Statistical data from table in PostgreSQL using single query How to select data in postgres with count and limit in a single query

Related Tags

Citus data - How to query data from a single shard in a query?

Question

1 answers

solution1 0 2022-09-22 12:01:21

solution1
0 2022-09-22 12:01:21