简体   繁体   English

使用MySQL作为键/值数据库的可伸缩性

[英]Scalability of Using MySQL as a Key/Value Database

I am interested to know the performance impacts of using MySQL as a key-value database vs. say Redis/MongoDB/CouchDB. 我很想知道使用MySQL作为键值数据库对Redis / MongoDB / CouchDB的性能影响。 I have used both Redis and CouchDB in the past so I'm very familiar with their use cases, and know that it's better to store key/value pairs in say NoSQL vs. MySQL. 我过去使用过Redis和CouchDB,所以我对它们的用例非常熟悉,并且知道在NoSQL与MySQL之间存储键/值对更好。

But here's the situation: 但情况如下:

  • the bulk of our applications already have lots of MySQL tables 我们的大部分应用程序已经有很多MySQL表
  • We host everything on Heroku (which only has MongoDB and MySQL, and is basically 1-db-type per app) 我们在Heroku(只有MongoDB和MySQL,并且每个应用程序基本上是1-db-type)上托管所有东西
  • we don't want to be using multiple different databases in this case. 在这种情况下,我们不希望使用多个不同的数据库。

So basically, I'm looking for some info on the scalability of having a key/value table in MySQL. 所以基本上,我正在寻找关于在MySQL中拥有键/值表的可伸缩性的一些信息。 Maybe at three different arbitrary tiers: 也许在三个不同的任意层:

  • 1000 writes per day 每天写1000次
  • 1000 writes per hour 每小时写1000次
  • 1000 writes per second 每秒1000次写入
  • 1000 reads per hour 每小时1000次读取
  • 1000 reads per second 每秒1000次读取

A practical example is in building something like MixPanel's Real-time Web Analytics Tracker , which would require writing very often depending on traffic. 一个实际的例子是构建类似MixPanel的实时Web分析跟踪器 ,这需要根据流量进行编写。

Wordpress and other popular software use this all the time: Post has "Meta" model which is just key/value, so you can add arbitrary properties to an object which can be searched over. Wordpress和其他流行的软件一直使用它:Post具有“Meta”模型,它只是键/值,因此您可以向可以搜索的对象添加任意属性。

Another option is to store a serializable hash in a blob but that seems worse. 另一种选择是在blob中存储可序列化的哈希,但这看起来更糟。

What is your take? 你有什么看法?

There is no doubt that using a NOSQL solution is going to be faster, since it is simpler. 毫无疑问,使用NOSQL解决方案会更快,因为它更简单。
NOSQL and Relational do not compete with each other, they are different tools that can solve different problems. NOSQL和Relational不会相互竞争,它们是可以解决不同问题的不同工具。
That being said for 1000 writes/day or per hour, MySQL will have no problem. 据说1000次写入/天或每小时,MySQL将没有问题。
For 1000 per second you will need some fancy hardware to get there. 每秒1000个,你需要一些花哨的硬件才能到达那里。 For the NOSQL solution you will probably still need some distributed file system. 对于NOSQL解决方案,您可能仍需要一些分布式文件系统。

It also depends on what you are storing. 它还取决于您存储的内容。

I'd say that you'll have to run your own benchmark because it is only you that knows the following important aspects: 我会说你必须运行自己的基准测试,因为只有你知道以下重要方面:

  • the size of the data to be stored in this KV table 要存储在此KV表中的数据大小
  • the level of parallelism you want to achieve 您想要实现的并行度
  • the number of existing queries reaching your MySQL instance 到达MySQL实例的现有查询数

I'd also say that depending on the durability requirements for this data, you'll also want to test multiple engines: InnoDB, MyISAM. 我还要说,根据这些数据的耐久性要求,您还需要测试多个引擎:InnoDB,MyISAM。

While I do expect some NoSQL solutions to be faster, based on your constraints you may find out that MySQL will perform good enough for your requirements. 虽然我确实希望某些NoSQL解决方案更快,但根据您的约束,您可能会发现MySQL的性能足以满足您的要求。

SQL databases are more and more used as a persistance layer, with computations and delivery cached in Key-Value repositories. SQL数据库越来越多地用作持久层,计算和交付缓存在Key-Value存储库中。

With this in mind, those guys have done quite a test here: 考虑到这一点,这些人在这里做了相当多的测试:

  • InnoDB inserts 43,000 records per second AT ITS PEAK*; InnoDB每秒插入43,000条记录AT ITS PEAK *;
  • TokuDB inserts 34,000 records per second AT ITS PEAK*; TokuDB每秒插入34,000条记录AT ITS PEAK *;
  • This KV inserts 100 millions of records per second (2,000+ times more). 这个KV每秒插入1亿条记录(2,000多次)。

To answer your question, a Key-Value repository is more than likely to outdo MySQL by several orders of magnitude: 为了回答你的问题, Key-Value存储库很可能超过MySQL几个数量级:

Processing 100,000,000 items: 处理100,000,000件物品:

kv_add()....time:....978.32 ms
kv_get().....time:....297.07 ms
kv_free()....time:........0.00 ms

OK, your test was 1,000 ops per second, but it can't hurt to be able to do 1,000 times more! 好吧,你的测试是每秒1,000操作,但是能够做1,000次以上也不会有什么害处!

See this for further details (they also compare it with Tokyo Cabinet ). 有关详细信息,请参阅内容(它们也与Tokyo Cabinet进行比较)。

Check out the series of blog posts here where the author runs tests comparing MongoDB and MySQL performance, and fights through the MySQL performance tuning mess. 查看这里的一系列博客文章其中作者运行测试比较MongoDB和MySQL性能,并通过MySQL性能调优混乱。 MongoDB was doing ~100K row reads per second, MySQL in c/s mode was doing 43K max, but with the embedded library he managed to get it up to 172K row reads per second. MongoDB每秒执行~100K行读取,而c / s模式下的MySQL最多执行43K,但是使用嵌入式库,他设法将其达到每秒172K行读取。

It sounds a little complicated to get that high on a single node, so ymmv. 在单个节点上获得那么高的声音听起来有点复杂,所以ymmv。

The writes/second question is a little harder, but this still might give you some ideas on configs to try. 写/秒问题有点困难,但这仍然可能会给你一些关于配置的想法。

You should first implement it in the simplest way then compare that. 您应该首先以最简单的方式实现它然后比较它。 Always test things. 总是测试一下。 This means: 这意味着:

  • Create a schema that's representative of your use case. 创建一个代表您的用例的模式。
  • Create queries representative of your use case. 创建代表您的用例的查询。
  • Create significant amounts of dummy data representive of your use case. 创建大量虚拟数据,代表您的用例。
  • In a variety of loops, including both random access and sequential, bench mark it. 在各种循环中,包括随机访问和顺序,基准标记它。
  • Ensure you use concurrency (run many processes randomly hammering the server with all kinds of queries representative of your use cases). 确保您使用并发(运行许多进程随机锤击服务器以及代表您的用例的各种查询)。

Once you have that, measure, test. 一旦你有了,测量,测试。 There are different ways you can go about it. 有不同的方法可以解决它。 Some tests can be simple but might be less realistic. 有些测试可能很简单但可能不太现实。 Measure throughput and latency. 测量吞吐量和延迟。

Then try to optimise it. 然后尝试优化它。

MySQL has one particular limitation for KV which is the standard Engines with persistence use indexes optimised for range lookups, not for KV, which might introduce some overhead, though it's also difficult to have things such as hash work with persistent storage due to rehashing. MySQL对KV有一个特殊的限制,即标准引擎具有针对范围查找而优化的持久性使用索引,而不是KV,这可能会引入一些开销,尽管由于重新散列而使哈希工作与持久存储一起工作也很困难。 Memory tables support a hash index. 内存表支持哈希索引。

Many people associate certain things with being slow such as SQL, RELATIONAL, JOINS, ACID, etc. 许多人将某些事情与缓慢关联,例如SQL,RELATIONAL,JOINS,ACID等。

When using an ACID capable relational database, you don't have to necessarily use ACID or relations. 使用支持ACID的关系数据库时,您不必使用ACID或关系。

While joins have a bad reputation for being slow this is usually down to misconceptions about joins. 虽然连接因缓慢而声名狼借,但这通常归结为对连接的误解。 Often people simply write bad queries. 通常人们只会编写错误的查询。 This is made more difficult as SQL is declarative, it can get things wrong, especially with JOINs where there are often multiple ways to perform the join. 这更加困难,因为SQL是声明性的,它可能会出错,特别是对于通常有多种方式来执行连接的JOIN。 What people are actually getting out of NoSQL in this case is imperative. 在这种情况下,人们实际上从NoSQL中获得了什么是必不可少的。 NoDeclaritive would be more accurate as that's the problem with SQL a lot of people are having. NoDeclaritive会更准确,因为很多人都会遇到SQL的问题。 Quite often people simply lack indexes. 人们往往缺乏索引。 That's not an argument in favour of joins but rather to illuminate where people can get it wrong on speed. 这不是支持联接的论据,而是说明人们在速度上可能出错的地方。

Traditional databases can be extremely fast if you do certain special things for that such as ignoring data integrity or handling it elsewhere. 如果您为此做某些特殊事情,例如忽略数据完整性或在其他地方处理它,传统数据库可以非常快。 You don't have to wait for the harddrive to flush writes, you don't have to enforce relations, you don't have to enforce unique constraints, you don't have to use transactions but if you do replace safety with speed then you need to know what you're doing. 您不必等待硬盘驱动器刷新写入,您不必强制执行关系,您不必强制执行唯一约束,您不必使用事务,但如果您确实用速度替换安全性然后你需要知道你在做什么。

NoSQL solutions by comparison first and foremost tend to be designed to support various modes of scaling out of the box. 相比之下,NoSQL解决方案首先倾向于设计为支持开箱即用的各种扩展模式。 The performance of an individual node might not be quite what you expect. 单个节点的性能可能与您期望的不同。 NoSQL solutions also struggle for general use with many having quite unusual performance characteristics or limited feature sets. NoSQL解决方案也很难用于一般用途,许多具有非常不寻常的性能特征或有限的功能集。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM