[英]How to shard from existing data in a table in Postgresql
I have a large table inter
, which contains 50 billion rows.我有一个大表
inter
,其中包含 500 亿行。 Each row consists of two columns, both of them are actually foreign keys of IDs of the other two tables(just the relation, foreign key constraints were not set in the database).每行由两列组成,这两列实际上是另外两个表的ID的外键(只是关系,数据库中没有设置外键约束)。
My table structure is like:我的表结构是这样的:
create table test_1(
id integer primary key,
content varchar(300),
content_len integer
);
create index test_1_id_len on test_1(id, content_len);
--this has 1.5 billion rows.
-- example row1: 1, 'alskfnla', 8
-- example row2: 1, 'asdgaagder', 10
-- example row3: 1, 'dsafnlakdsvn', 12
create table test_2(
id integer primary key,
split_str char(3)
);
--this has 60,000 rows.
-- example row1: 1, 'abc'
-- example row2: 2, 'abb'
create table inter(
id_1 integer, -- id of test_1
id_2 integer -- id of test_2
);
create index test_index_1 on inter(id_1);
create index test_index_2 on inter(id_2);
create index test_index_1_2 on inter(id_1, id_2);
--this has 50 billion rows.
-- example row1: 1, 2
-- example row2: 1, 3
-- example row3: 1, 4
Further, I need to do some queries like此外,我需要做一些查询,如
select *
from inter
inner join test_1 on(test_1.id = inter.id_1)
where id_2 in (1,2,3,4,5,67,8,9,10)
and test_1.content_len = 30
order by id_2;
The reason why I want to shard the table is that I could not create indices on the two columns( the transactions did not end for one week, and it occupied full virtual memory).之所以要对表进行分片,是因为我无法在两列上创建索引(事务没有结束一周,并且占用了全部虚拟内存)。
SO I am considering to shard the table by one of the columns.所以我正在考虑按列之一对表进行分片。 This column has around 60,000 values, from 1 to 60,000.
此列有大约 60,000 个值,从 1 到 60,000。 I would like to shard the table to 60,000 subtables.
我想将表分成 60,000 个子表。 I do some searches, but most of the articles do it by a trigger, which could not be applied in my case since the data are already in the table.
我做了一些搜索,但大多数文章都是通过触发器来完成的,由于数据已经在表中,因此无法在我的情况下应用。 Does anyone know how to do that, thanks a lot!
有谁知道怎么做,非常感谢!
ENV: redhat, RAM 180GB, postgresql 11.0 ENV:redhat,RAM 180GB,postgresql 11.0
You don't want to shard the table, but partition it.您不想对表进行分片,而是对其进行分区。
60000 partitions is too many. 60000 个分区太多了。 Use list partitioning to split the table in something like at most 600 partitions.
使用列表分区将表拆分为最多 600 个分区。 Make sure to upgrade to PostgreSQL v12 so that you can benefit from the latest performance improvements.
确保升级到 PostgreSQL v12,以便您可以从最新的性能改进中受益。
The hard part will be moving the data without eexcessive downtime.困难的部分是在不过度停机的情况下移动数据。 Perhaps you can use triggers to capture changes while you
INSERT INTO ... SELECT
and catch up later.也许您可以在
INSERT INTO ... SELECT
时使用触发器来捕获更改并稍后赶上。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.