简体   繁体   English

如何从 Postgresql 表中的现有数据分片

[英]How to shard from existing data in a table in Postgresql

I have a large table inter , which contains 50 billion rows.我有一个大表inter ,其中包含 500 亿行。 Each row consists of two columns, both of them are actually foreign keys of IDs of the other two tables(just the relation, foreign key constraints were not set in the database).每行由两列组成,这两列实际上是另外两个表的ID的外键(只是关系,数据库中没有设置外键约束)。

My table structure is like:我的表结构是这样的:

create table test_1(
    id integer primary key,
    content varchar(300),
    content_len integer
);

create index test_1_id_len on test_1(id, content_len);

--this has 1.5 billion rows.
-- example row1: 1, 'alskfnla', 8
-- example row2: 1, 'asdgaagder', 10
-- example row3: 1, 'dsafnlakdsvn', 12

create table test_2(
    id integer primary key,
    split_str char(3)
);

--this has 60,000 rows.
-- example row1: 1, 'abc'
-- example row2: 2, 'abb'

create table inter(
    id_1 integer,    -- id of test_1
    id_2 integer     -- id of test_2 
);

create index test_index_1 on inter(id_1);
create index test_index_2 on inter(id_2);
create index test_index_1_2 on inter(id_1, id_2);

--this has 50 billion rows.
-- example row1: 1, 2
-- example row2: 1, 3
-- example row3: 1, 4

Further, I need to do some queries like此外,我需要做一些查询,如

select * 
from inter 
  inner join test_1 on(test_1.id = inter.id_1) 
where id_2 in (1,2,3,4,5,67,8,9,10) 
  and test_1.content_len = 30 
order by id_2;

The reason why I want to shard the table is that I could not create indices on the two columns( the transactions did not end for one week, and it occupied full virtual memory).之所以要对表进行分片,是因为我无法在两列上创建索引(事务没有结束一周,并且占用了全部虚拟内存)。

SO I am considering to shard the table by one of the columns.所以我正在考虑按列之一对表进行分片。 This column has around 60,000 values, from 1 to 60,000.此列有大约 60,000 个值,从 1 到 60,000。 I would like to shard the table to 60,000 subtables.我想将表分成 60,000 个子表。 I do some searches, but most of the articles do it by a trigger, which could not be applied in my case since the data are already in the table.我做了一些搜索,但大多数文章都是通过触发器来完成的,由于数据已经在表中,因此无法在我的情况下应用。 Does anyone know how to do that, thanks a lot!有谁知道怎么做,非常感谢!

ENV: redhat, RAM 180GB, postgresql 11.0 ENV:redhat,RAM 180GB,postgresql 11.0

You don't want to shard the table, but partition it.您不想对表进行分片,而是对其进行分区。

60000 partitions is too many. 60000 个分区太多了。 Use list partitioning to split the table in something like at most 600 partitions.使用列表分区将表拆分为最多 600 个分区。 Make sure to upgrade to PostgreSQL v12 so that you can benefit from the latest performance improvements.确保升级到 PostgreSQL v12,以便您可以从最新的性能改进中受益。

The hard part will be moving the data without eexcessive downtime.困难的部分是在不过度停机的情况下移动数据。 Perhaps you can use triggers to capture changes while you INSERT INTO ... SELECT and catch up later.也许您可以在INSERT INTO ... SELECT时使用触发器来捕获更改并稍后赶上。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将具有自定义枚举类型的数据从 csv 插入到现有的 PostgreSQL 表中 - How to insert data with custom enum type from a csv into an existing PostgreSQL table Postgresql:如何将临时表中的唯一行附加到现有表? - Postgresql: How to append unique rows from a temp table to an existing table? PostgreSQL根据现有表自动创建列 - PostgreSQL automatically create column from existing table 如何验证表更新并从另一个表迁移数据 - postgresql - How to verify table update and migrate data from another table - postgresql 如何将前 1000 条记录从现有表中的 7000 条记录复制到 postgreSql 中的其他新表 - how to copy top 1000 records from 7000 records in existing table to other new table in postgreSql 如何从一张表中选择第二张表中不存在的数据? - How to select from one table non existing data in second table? 如何将不同表中的特定列添加到 postgresql 中的现有表中? - How do I add specific columns from different tables onto an existing table in postgresql? Postgresql ,用另一个表数据更新现有表行 - Postgresql , updating existing table row with another tables data 如何在PostgreSQL的子表中选择数据? - How do I SELECT data from child table in PostgreSQL? 如何将数据从一个表作为 PostgreSQL 数组插入到另一个表中? - How to insert data from one table into another as PostgreSQL array?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM