简体   繁体   English

使用PostgresXL分片现有的Postgresql数据库

[英]Sharding existing postgresql database with PostgresXL

We want to shard our PostgreSQL DB, due to high disk load. 由于磁盘负载高,我们希望分片PostgreSQL数据库。 Firstly, we looked at django-sharding library , but: 首先,我们看了django-sharding库 ,但是:

  1. Very much rewriting in our backend 在我们的后端有很多重写
  2. Migrating all tables to 64-bit primary keys is hard work on 300-400gb tables 在300-400GB的表上,将所有表迁移到64位主键非常困难
  3. Generating ids with Postgres Specific algorithm makes it impossible to move data from shard to shard. 使用Postgres Specific算法生成id使得无法将数据从碎片移动到碎片。 More than that, we have a large database with old ids. 不仅如此,我们还有一个具有旧ID的大型数据库。 Updating all of them is a big problem too. 更新所有这些也是一个大问题。
  4. Generating ids with special tables makes us do a special SELECT query to main database every time we insert data. 生成带有特殊表的id会使我们在每次插入数据时对主数据库执行特殊的SELECT查询。 We have high write load, so it's not good. 我们的写负载很高,所以不好。

Considring all these, we decided too look on Postgres database sharding solutions. 考虑到所有这些,我们决定也考虑使用Postgres数据库分片解决方案。 We found 2 opportunities - Citus and PostgresXL. 我们找到2个机会-Citus和PostgresXL。 Citus makes us change data format too much and rewrite a big bunch of backend at the same time, so we are about to try PostgresXL as more transparent solution. Citus使我们过多地更改了数据格式,并同时重写了大量后端,因此我们将尝试使用PostgresXL作为更透明的解决方案。 But reading the docs, I can't understand some things and will be greatfull for recomendations: 但是,阅读文档后,我无法理解某些内容,并且非常推荐:

  1. Are there any other sharding workarounds except for Citus and PostgresXL? 除Citus和PostgresXL之外,还有其他分片解决方法吗? It would be good not to change much in our database on migrating. 最好不要在迁移时对数据库进行太多更改。
  2. Some questions about PostgresXL: 有关PostgresXL的一些问题:
    • Do I understand correctly, that it's not Postgres extension, it's a standalone fork? 我是否正确理解,这不是Postgres扩展,而是独立的fork? So I should build all its parts from sources and than move data in some way? 因此,我应该从源头构建所有部分,而不是以某种方式移动数据?
    • How are Postgres and PostgresXL versions compatible? Postgres和PostgresXL版本如何兼容? We have PostgreSQL 9.4. 我们有PostgreSQL 9.4。 I don't see such a version in PostgresXL (9.2 or 9.5 no middle?). 我在PostgresXL中看不到这样的版本(9.2或9.5中间没有?)。 So can I use, for example, streaming replication for migration? 那么我可以使用流复制进行迁移吗?
    • If yes/no, what is the best solution to migrate data? 如果是/否,迁移数据的最佳解决方案是什么? If I have 2Tb database with heavy write, can I migrate it somehow without stopping for a long period of time? 如果我的2Tb数据库写得很重,是否可以在不停止很长一段时间的情况下以某种方式迁移它?

Thanks. 谢谢。

First off to save your self a LOT of headache have you looked at options Like Amazon's Auora, Dynomo, Red Shift, etc services? 首先,要让自己省下很多麻烦,您是否看过诸如Amazon的Auora,Dynomo,Red Shift等服务之类的选项? They are VERY cost effective at scale, as well as optimized and managed for you. 它们在规模上非常具有成本效益,并为您优化和管理。

Actually Amazon's straight Postgress databases can handle MASSIVE amounts of reads or writes. 实际上,Amazon的直接Postgress数据库可以处理大量读取或写入。 We can go into 2,000- 6,000 IOPS on reads and another 2,000 to 6,000 IOPS in writes without issue. 读取时,我们可以达到2,000- 6,000 IOPS,写入时又可以达到2,000至6,000 IOPS。 I would really look into this as the option. 我真的会将此作为选择。 Azure, Oracle, and Google also have competing services. Azure,Oracle和Google也提供竞争服务。

Also be aware that Postgres-XL beyond all reason has no HA support. 另请注意,Postgres-XL出于所有原因均不支持HA。 If you lose a single node you lose everything. 如果丢失单个节点,则将丢失所有内容。 The nodes can not fail over. 节点无法故障转移。

it's a standalone fork? 这是一个独立的叉子?

Yes, They are very different apps and developed separate from each other. 是的,它们是非常不同的应用程序,并且彼此独立开发。

How are Postgres and PostgresXL versions compatible? Postgres和PostgresXL版本如何兼容?

They arn't compatible. 他们不兼容。 You can not just migration Postgres to Postgresl-XL. 您不仅可以将Postgres迁移到Postgresl-XL。 They work VERY differently. 他们的工作方式截然不同。

Generating ids with Postgres Specific algorithm makes it impossible to >move data from shard to shard 使用Postgres Specific算法生成id使得无法将数据从碎片移动到碎片

Not following this, but with sharing you are not supposed to move data from one shard to another. 不遵循这一点,但是通过共享,您不应该将数据从一个分片移动到另一个分片。 The key being used generally needs to be something specific and unique to split/segregate your data on. 通常,所使用的密钥必须是特定的且唯一的,以拆分/隔离您的数据。 Like a date, or a "type" field, or some other (hopefully ordered) field(s)/column(s). 例如日期,“类型”字段或其他(希望排序)字段/列。 This breaks things up but has obvious pain in the a$$ limitations. 这可以使事情分解,但是在a $$的限制方面显然有痛苦。

Are there any other sharding workarounds except for Citus and PostgresXL? 除Citus和PostgresXL之外,还有其他分片解决方法吗? It would be good not to change much in our database on >>migrating. 最好不要在我们的数据库中进行>> migration更改。

Tons of options, but right off the bat going from a standard RDS, to a NoSql, or MPP database is going to be a major migration, a lot of effort, and have a LOT of limitations no matter what you do. 从标准RDS到NoSql或MPP数据库的大量选择都是可行的,无论您做什么,都将是一个重大的迁移,很多工作,并且有很多限制。

Next Postress-XL and Citus are MPP (massive parallel processing) clustering apps, not sharing specifically. Next Postress-XL和Citus是MPP(大规模并行处理)群集应用程序,没有专门共享。 That is part of what they can do, but it is not their focus. 这是他们可以做的一部分,但这不是他们的重点。

Other options for MPP MPP的其他选项

pgPool -- (not great for heavy writes ) pgPool-(不适用于大量写入)

haProxy -- ( have not done it but read about it. Lost of work to setup and maintain. ) haProxy-(尚未完成但请阅读它。失去了设置和维护的工作。)

MySql Cluster -- (Huge pain to use the OSS version and major $$$ for the commercial version) MySql Cluster-(使用OSS版本和商业版本的主要使用费很大的痛苦)

Green Plumb 绿色铅锤

Teradata Teradata数据

Vertica Vertica的

what is the best solution to migrate data? 什么是迁移数据的最佳解决方案?

Very unlikely to find a simple migration for this kind of switch. 对于这种交换机,很难找到简单的迁移方法。 You can expect to likely need to export the data your self from the existing RDS and import it to the new DB and will likely have to write something your self to get it the way you want it. 您可能期望可能需要从现有RDS导出自己的数据并将其导入到新的DB,并且可能必须编写一些自己的数据才能以所需的方式获得它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM