简体   繁体   中英

Best way to shard Mysql database

I have a huge number of users so I am needed to shard the databases in n shards. So to proceed with this I have below options-

  1. Divide my data in n shards basis userId modulus n operation. ie if I have 10 shards userId 1999 will be sent to 1999%10=9th shard
    Problem- The problem with this approach is if the number of shard increases in future reference to previous will not be maintained.

  2. I can maintain a table with UserId and ShardId
    Problem- If my users increase in future to billions I'll need this mapping table to be shared which doesn't seem to be good solution.

  3. I can maintain static mapping in code like 0-10000 in Shard 1 and more on.
    Problem-

    • With the increase in shards and Users Code needed to be changed more often.
    • If any specific User in shard has huge data It'd get difficult to separate out the shard.

So, these are the three ways I could have found but all having some problem. What would be an alternate or better approach to shard the MySQL tables which can compensate with increased number of shards and users in future.

I prefer a hybrid of 1 and 2:

  1. Hash the UserId into, say, 4096 values.
  2. Look up that number in a 'dictionary' that has shard numbers in it.

If a shard gets too full, migrate all the users with some hash number to another shard.

If you add a shard, migrate a few hash numbers to it - preferable from busy shards.

This forces you to write a script for moving users, and make it robust. Once you have that, a lot of other admin tasks become 'simple':

  • Retire a machine
  • Upgrade the OS (one by one across shards)
  • Upgrade whatever software is on the machines
  • Migrate a hash number that is bulky but not busy to a old, slow, shard that has a big disk. Similarly migrate small and busy to a shard with more cores and faster disks.

Each shard could be an HA cluster (Galera, Group replication, etc) of servers for both reliability and read-scaling. (Sharding gives you write-scaling.

There would need to be a way to distribute the dictionary to all clients "promptly".

All of this works well if you have, say, each hash in 3 different shards for HA. Each of the 3 would be at geographic locations for robustness. The dictionary would have 4 columns to say where the copies are. The 4th would be used during migrations.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM