简体繁体 English

PHP：创建自动数据库分片逻辑？

[英]php: Creating automatic database sharding logic?

原文 2010-08-13 18:21:45 0 4 php/ mysql/ relational-database/ sharding

I did just come up with the following idea, but I lack the knowledge to say if it is applicable to a production application. 我只是想出了以下想法，但是我缺乏知识来说它是否适用于生产应用程序。

We have a web application, built on PHP/mySQL to make it easy. 我们有一个基于PHP / mySQL构建的Web应用程序，以使其变得容易。 A table in the database are prone to grow large - a couple million of records easily, so table sharding might be an option here. 数据库中的表容易变大-轻松记录几百万条记录，因此在这里表分片可能是一个选择。

Here's how I have imagined the process to work: 这是我想象的工作流程：

A cached file contains the a list with the available tables in the database. 缓存的文件包含一个列表，其中包含数据库中的可用表。 Each table contains a maximum of a million rows and when that is reached, the cached list is recreated after a new table has been constructed. 每个表最多包含一百万行，到达该行时，将在构造新表之后重新创建缓存的列表。

Obviously it wouldn't be a good idea to check the number of rows on every write to the table, so this could be done on a set interval, like a week, or daily - depending on how quick every million of data is created. 显然，检查表的每次写入的行数不是一个好主意，因此可以按固定的时间间隔（例如一周或每天）进行一次，具体取决于创建一百万个数据的速度。

Would this be a good way to deal with large amount of data and to keep index sizes fairly low? 这是处理大量数据并保持索引大小相当小的好方法吗？

Thanks 谢谢

4 个解决方案

If you are planning ahead for the possibility of enormous growth (game gone viral, for instance) you can follow the steps of those before you and go NoSQL. 如果您正在为可能的巨大增长而预先计划（例如游戏变得病毒式传播），则可以按照之前的步骤进行操作，然后使用NoSQL。

Couchbase / powers Zinga (and is a personal favorite) Couchbase /为Zinga提供支持（并且是个人最爱）
Apache Cassandra / powers Twitter Apache Cassandra /为Twitter提供动力
mongoDB / powers Craiglist mongoDB /授权Craiglist

But you're building a site in php/MySQL to "make it easy" so don't re-invent the wheel on an extremely big problem . 但是，您正在php / MySQL中构建一个站点来“使其变得容易”，所以不要在一个非常大的问题上重新发明轮子 。

Don't mess with the data. 不要搞乱数据。 Go for a proven solution. 寻求成熟的解决方案。 MySQL included. 包括MySQL。

You should use horizontal partitioning, partition the table by number of records, lets say every partition will have a million records, that way mysql will internally be handling the partitioning, and besides instead of one big index, the indexes would be partitioned as well. 您应该使用水平分区，按记录数对表进行分区，可以说每个分区将有一百万条记录，这样mysql将在内部处理分区，除了代替一个大索引之外，索引也将被分区。

With all honesty, I don't think that would be a great idea. 坦率地说，我认为这不是一个好主意。 You should look into possibly archiving old data or going to a NoSQL solution like MOngo. 您应该研究可能归档旧数据或使用NoSQL解决方案，例如MOngo。

The performance of indexes does not degrade linearly with the size of the table. 索引的性能不会随着表的大小线性下降。 Tables have to be seriously massive before that becomes an issue. 在此之前，表必须非常庞大。 If you are seeing performance problems, I'd start doing mysql 'explains' and making sure all your queries are doing the least amount of row scans they can do. 如果您发现性能问题，我将开始进行mysql“解释”，并确保所有查询都在进行尽可能少的行扫描。 You might be suprised at what the actual bottleneck ends up being. 您可能会对实际的瓶颈最终感到惊讶。

So, basically, if you need the data, I wouldnt go messing around with it. 因此，基本上，如果您需要数据，我不会搞乱它。 On the other hand, if its something like session data, just delete the rows that are too old. 另一方面，如果它类似于会话数据，则只需删除太旧的行。