简体   繁体   English

MySQL Vs MongoDB聚合性能

[英]MySQL Vs MongoDB aggregation performance

I'm currently testing some databases to my application. 我正在测试一些数据库到我的应用程序。 The main functionality is data aggregation (similar to this guy here: Data aggregation mongodb vs mysql ). 主要功能是数据聚合(类似于这里的人: 数据聚合mongodb vs mysql )。

I'm facing the same problem. 我面临着同样的问题。 I've created a sample test data. 我已经创建了一个示例测试数据。 No joins on the mysql side, it's a single innodb table. 在mysql方面没有连接,它是一个innodb表。 It's a 1,6 milion rows data set and I'm doing a sum and a count on the full table, without any filter, so I can compare the performance of the aggregation engine of each one. 这是一个1,600万行的数据集,我正在对整个表进行求和和计数,没有任何过滤器,所以我可以比较每个的聚合引擎的​​性能。 All data fits in memory in both cases. 在这两种情况下,所有数据都适合内存。 In both cases, there is no write load. 在这两种情况下,都没有写入负载。

With MySQL (5.5.34-0ubuntu0.12.04.1) I'm getting results always around 2.03 and 2.10 seconds. 使用MySQL(5.5.34-0ubuntu0.12.04.1),我的结果总是在2.03和2.10秒左右。 With MongoDB (2.4.8, linux 64bits) I'm getting results always between 4.1 and 4.3 seconds. 使用MongoDB(2.4.8,linux 64bits),我得到的结果总是在4.1到4.3秒之间。

If I do some filtering on indexed fields, MySQL result time drops to around 1.18 and 1.20 (the number of rows processed drops to exactly half the dataset). 如果我对索引字段进行一些过滤,MySQL结果时间会下降到大约1.18和1.20(处理的行数下降到数据集的一半)。 If I do the same filtering on indexed fields on MongoDB, the result time drops only to around 3.7 seconds (again processing half the dataset, which I confirmed with an explain on the match criteria). 如果我对MongoDB上的索引字段进行相同的过滤,结果时间仅下降到3.7秒左右(再次处理数据集的一半,我通过匹配条件的解释确认)。

My conclusion is that: 1) My documents are extremely bad designed (truly can be), or 2) The MongoDB aggregation framework realy does not fit my needs. 我的结论是:1)我的文档设计非常糟糕(真的可以),或者2)MongoDB聚合框架真的不适合我的需要。

The questions are: what can I do (in terms of especific mongoDB configurations, document modeling, etc) to make Mongo's results faster? 问题是:我可以做些什么(在特定的mongoDB配置,文档建模等方面)使Mongo的结果更快? Is this a case where MongoDB is not suited to? 这是MongoDB不适合的情况吗?

My table and documento schemas: 我的表和documento架构:

| | events_normal | events_normal |

CREATE TABLE `events_normal` (
  `origem` varchar(35) DEFAULT NULL,
  `destino` varchar(35) DEFAULT NULL,
  `qtd` int(11) DEFAULT NULL,
  KEY `idx_orides` (`origem`,`destino`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |

{
    "_id" : ObjectId("52adc3b444ae460f2b84c272"),
    "data" : {
        "origem" : "GRU",
        "destino" : "CGH",
        "qtdResultados" : 10
    }
}

The indexed and filtered fields when mentioned are "origem" and "destino". 提到的索引和过滤字段是“origem”和“destino”。

select sql_no_cache origem, destino, sum(qtd), count(1) from events_normal group by origem, destino;
select sql_no_cache origem, destino, sum(qtd), count(1) from events_normal where origem="GRU" group by origem, destino;

db.events.aggregate( {$group: {         _id: {origem: "$data.origem", destino: "$data.destino"},         total: {$sum: "$data.qtdResultados" },         qtd: {$sum: 1}     }  } )
db.events.aggregate( {$match: {"data.origem":"GRU" } } , {$group: {         _id: {origem: "$data.origem", destino: "$data.destino"},         total: {$sum: "$data.qtdResultados" },         qtd: {$sum: 1}     }  } )

Thanks! 谢谢!

Aggregation is not really what MongoDB was originally designed for, so it's not really its fastest feature. 聚合并不是MongoDB最初设计的,所以它并不是它最快的功能。

When you really want to use MongoDB, you could use sharding so that each shard can process its share of the aggregation (make sure to select the shard-key in a way that each group is on only one cluster, or you will achieve the opposite). 当你真的想使用MongoDB时,你可以使用分片,这样每个分片都可以处理它的聚合份额(确保以每个组只在一个簇上的方式选择分片键,否则你将实现相反的目的) )。 This, however, wouldn't be a fair comparison to MySQL anymore because the MongoDB cluster would use a lot more hardware. 然而,这不再是与MySQL的公平比较,因为MongoDB集群将使用更多的硬件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM