简体繁体 English

对于简单的操作，Cassandra比Mysql慢得多？

[英]Cassandra is much slower than Mysql for simple operations?

原文 2012-01-13 12:04:07 2 4 php/ mysql/ performance/ cassandra

I see a lot of statements like: "Cassandra very fast on writes", "Cassandra has reads really slower than writes, but much faster than Mysql" 我看到很多声明如下：“Cassandra写入速度非常快”，“Cassandra的读取速度比写入慢，但比Mysql快得多”

On my windows7 system: I installed Mysql of default configuration. 在我的windows7系统上：我安装了默认配置的Mysql。 I installed PHP5 of default configuration. 我安装了PHP5的默认配置。 I installed Casssandra of default configuration. 我安装了默认配置的Casssandra。

Making simple write test on mysql: "INSERT INTO wp_test ( id , title ) VALUES ('id01','test')" gives me result: 0.0002(s) For 1000 inserts: 0.1106(s) 在mysql上进行简单的写测试：“INSERT INTO wp_test （ id ， title ）VALUES（'id01'，'test'）”给我结果：0.0002（s）1000个插入：0.1106（s）

Making simple same write test on Cassandra: $column_faily->insert('id01',array('title'=>'test')) gives me result of: 0.005(s) For 1000 inserts: 1.047(s) 在Cassandra上进行简单的相同写入测试：$ column_faily-> insert（'id01'，array（'title'=>'test'））给出结果：0.005（s）对于1000个插入：1.047（s）

For reads tests i also got that Cassandra is much slower than mysql. 对于读取测试，我也得知Cassandra比mysql慢得多。

So the question, does this sounds correct that i have 5ms for one write operation on Cassadra? 所以问题是，这听起来是否正确，我在Cassadra上进行一次写操作需要5ms？ Or something is wrong and should be at least 0.5ms. 或者出了点问题，应该至少0.5ms。

4 个解决方案

When people say "Cassandra is faster than MySQL", they mean when you are dealing with terabytes of data and many simultaneous users. 当人们说“Cassandra比MySQL更快”时，它们意味着当你处理数TB的数据和许多同时用户时。 Cassandra (and many distributed NoSQL databases) is optimized for hundreds of simultaneous readers and writers on many nodes, as opposed to MySQL (and other relational DBs) which are optimized to be really fast on a single node, but tend to fall to pieces when you try to scale them across multiple nodes. Cassandra（以及许多分布式NoSQL数据库）针对许多节点上的数百个同时读取器和写入器进行了优化，而不是MySQL（和其他关系数据库），它们被优化为在单个节点上非常快速，但往往会崩溃。您尝试跨多个节点缩放它们。 There is a generalization of this trade-off by the way- the absolute fastest disk I/O is plain old UNIX flat files, and many latency-sensitive financial applications use them for that reason. 顺便说一下这种权衡的概括 - 绝对最快的磁盘I / O是普通的旧UNIX平面文件，并且许多对延迟敏感的金融应用程序使用它们就是出于这个原因。

If you are building the next Facebook, you want something like Cassandra because a single MySQL box is never going to stand up to the punishment of thousands of simultaneous reads and writes, whereas with Cassandra you can scale out to hundreds of data nodes and handle that load easily. 如果你正在构建下一个Facebook，你需要像Cassandra这样的东西，因为单个MySQL盒子永远不会经受数千个同时读写的惩罚，而使用Cassandra你可以扩展到数百个数据节点并处理容易装 See scaling up vs. scaling out . 请参阅扩展与扩展。

Another use case is when you need to apply a lot of batch processing power to terabytes or petabytes of data. 另一个用例是当您需要将大量批处理能力应用于太字节或数PB的数据时。 Cassandra or HBase are great because they are integrated with MapReduce, allowing you to run your processing on the data nodes . Cassandra或HBase很棒，因为它们与MapReduce集成，允许您在数据节点上运行处理。 With MySQL, you'd need to extract the data and spray it out across a grid of processing nodes, which would consume a lot of network bandwidth and entail a lot of unneeded complication. 使用MySQL，您需要提取数据并将其喷射到处理节点网格中，这将消耗大量网络带宽并导致许多不必要的复杂化。

Cassandra benefits greatly from parallelisation and batching. Cassandra从并行化和批处理中获益匪浅。 Try doing 1 million inserts on each of 100 threads (each with their own connection & in batches of 100) and see which ones is faster. 尝试在100个线程中的每个线程上进行100万次插入（每个线程都有自己的连接并且批量为100个）并查看哪些线程更快。

Finally, Cassandra insert performance should be relatively stable (maintaining high throughput for a very long time). 最后，Cassandra插入性能应该相对稳定（长时间保持高吞吐量）。 With MySQL, you will find that it tails off rather dramatically once the btrees used for the indexes grow too large memory. 使用MySQL，你会发现，一旦用于索引的btree增加了太大的内存，它就会大大减少。

Many user space factors can impact write performance. 许多用户空间因素会影响写入性能。 Such as: 如：

Dozens of settings in each of the database server's configuration. 每个数据库服务器的配置中有数十个设置。
The table structure and settings. 表格结构和设置。
The connection settings. 连接设置。
The query settings. 查询设置。

Are you swallowing warnings or exceptions? 您是否在吞咽警告或例外？ The MySQL sample would on face value be expected to produce a duplicate key error. MySQL样本在面值上会产生重复的键错误。 It could be failing while doing nothing at all. 一无所取，它可能会失败。 What Cassandra might do in the same case isn't something I'm familiar with. Cassandra在同一案件中可能做的事情并不是我所熟悉的。

My limited experience of Cassandra tell me one thing about inserts, while performance of everything else degrades as data grows, inserts appear to maintain the same speed. 我对Cassandra的有限经验告诉我关于插入的一件事，而其他一切的性能随着数据的增长而降低，插入看起来保持相同的速度。 How fast it is compared to MySQL however isn't something I've tested. 然而，它与MySQL的速度有多快，并不是我测试过的。

It might not be so much that inserts are fast but rather tries to be never slow. 它可能不是那么多，插入是快速的，而是试图永远不会慢。 If you want a more meaningful test you need to incorporate concurrency and more variations on scenario such as large data sets, various batch sizes, etc. More complex tests might test latency for availability of data post insert and read speed over time. 如果您需要更有意义的测试，则需要在场景中加入并发性和更多变体，例如大型数据集，各种批量大小等。更复杂的测试可能会测试插入后数据可用性的延迟和读取速度。

It would not surprise me if Cassandra's first port of call for inserting data is to put it on a queue or to simply append. 如果Cassandra用于插入数据的第一个调用端口是将其放入队列或简单地追加，我不会感到惊讶。 This is configurable if you look at consistency level. 如果您查看一致性级别，这是可配置的。 MySQL similarly allows you to balance performance and reliability/availability though each will have variations on what they allow and don't allow. MySQL同样允许您平衡性能和可靠性/可用性，尽管每个都会对它们允许和不允许的内容有所不同。

Outside of that unless you get into the internals it may be hard to tell why one performs better than the other. 除此之外，除非你进入内部，否则可能很难说为什么一个表现优于另一个。

I did some benchmarks of a use case I had for Cassandra a while ago. 我做了一些我曾经为Cassandra用过的用例的基准测试。 For the benchmark it would insert tens of thousands of rows first. 对于基准测试，它将首先插入数万行。 I had to make the script sleep for a few seconds because otherwise queries run after the fact would not see the data and the results would be inconsistent between implementations I was testing. 我不得不让脚本休眠几秒钟，因为否则查询会在事实看不到数据后运行，结果在我测试的实现之间会不一致。

If you really want fast inserts, append to a file on ramdisk. 如果你真的想要快速插入，请附加到ramdisk上的文件中。

It's likely that the maturity of the MySQL drivers, especially the improved MySQL drivers in PHP 5.3, is having some impact on the tests. MySQL驱动程序的成熟度，特别是PHP 5.3中改进的MySQL驱动程序，可能会对测试产生一些影响。 It's also entirely possible that the simplicity of the data in your query is impacting the results - maybe on 100 value inserts, Cassandra becomes faster. 查询中数据的简单性也完全有可能影响结果 - 可能是100个值插入，Cassandra变得更快。

Try the same test from the command line and see what the timestamps are, then try with varying numbers of values. 从命令行尝试相同的测试，看看时间戳是什么，然后尝试使用不同数量的值。 You can't do a single test and base your decision on that. 您无法进行单一测试并根据该测试做出决定。