简体繁体 English

MongoDB聚合比较：group（），$ group和MapReduce

[英]MongoDB aggregation comparison: group(), $group and MapReduce

原文 2012-09-09 07:36:59 3 1 mongodb/ mapreduce/ mongodb-query/ aggregation-framework

I am somewhat confused about when to use group(), aggregate with $group or mapreduce. 关于何时使用group（），与$ group或mapreduce聚合，我有点困惑。 I read the documentation at http://www.mongodb.org/display/DOCS/Aggregation for group(), http://docs.mongodb.org/manual/reference/aggregation/group/#_S_group for $group.. Is sharding the only situation where group() won't work? 我在http://www.mongodb.org/display/DOCS/Aggregation上阅读了针对group（）的文档， http： //docs.mongodb.org/manual/reference/aggregation/group/#_S_group for $ group ..分片是否group（）不起作用的唯一情况？ Also, I get this feeling that $group is more powerful than group() because it can be used in conjunction with other pipeline operators from aggregation framework.. How does $group compare with mapreduce? 另外，我觉得$ group比group（）更强大，因为它可以与聚合框架中的其他管道运算符一起使用。$ group如何与mapreduce进行比较？ I read somewhere that it doesn't generate any temporary collection whereas mapreduce does. 我在某处读到它不生成任何临时集合，而mapreduce则生成。 Is that so? 是这样吗？
Can someone present an illustration or guide me to a link where these three concepts are explained together, taking the same sample data, so I can compare them easily? 有人可以提供插图或指导我一起解释这三个概念的链接，采用相同的样本数据，以便我可以轻松地比较它们吗？

EDIT: 编辑：
Also, it would be great if you can point out anything new specifically in these commands since the new 2.2 release came out.. 此外，如果您可以在新的2.2版本发布后指出这些命令中的任何新内容，那就太棒了。

1 个解决方案

It is somewhat confusing since the names are similar, but the group() command is a different feature and implementation from the $group pipeline operator in the Aggregation Framework. 这有点令人困惑，因为名称相似，但group()命令是聚合框架中$group管道运算符的不同特性和实现。

The group() command, Aggregation Framework, and MapReduce are collectively aggregation features of MongoDB. group（）命令，Aggregation Framework和MapReduce是MongoDB的统一聚合功能 。 There is some overlap in features, but I'll attempt to explain the differences and limitations of each as at MongoDB 2.2.0. 功能有一些重叠，但我将尝试解释MongoDB 2.2.0中每个功能的差异和局限性。

Note: inline result sets mentioned below refer to queries that are processed in memory with results returned at the end of the function call. 注意：下面提到的内联结果集指的是在内存中处理的查询，并在函数调用结束时返回结果。 Alternative output options (currently only available with MapReduce) could include saving results to a new or existing collection. 替代输出选项（目前仅适用于MapReduce）可以包括将结果保存到新的或现有的集合。

`group()` Command `group()`命令

Simple syntax and functionality for grouping .. analogous to GROUP BY in SQL. 用于分组的简单语法和功能..类似于SQL中的GROUP BY 。
Returns result set inline (as an array of grouped items). 返回结果集内联（作为分组项的数组）。
Implemented using the JavaScript engine; 使用JavaScript引擎实现; custom reduce() functions can be written in JavaScript. 自定义reduce()函数可以用JavaScript编写。
Current Limitations 目前的限制
- Will not group into a result set with more than 20,000 keys. 不会分组到超过20,000个键的结果集。
- Results must fit within the limitations of a BSON document (currently 16MB). 结果必须符合BSON文件的限制（目前为16MB）。
- Takes a read lock and does not allow any other threads to execute JavaScript while it is running. 采用读锁定，并且在运行时不允许任何其他线程执行JavaScript。
- Does not work with sharded collections. 不适用于分片集合。
See also: group() command examples . 另请参见： group（）命令示例 。

MapReduce MapReduce的

Implements the MapReduce model for processing large data sets. 实现MapReduce模型以处理大型数据集。
Can choose from one of several output options (inline, new collection, merge, replace, reduce) 可以从多个输出选项中选择一个（内联，新集合，合并，替换，减少）
MapReduce functions are written in JavaScript. MapReduce函数是用JavaScript编写的。
Supports non-sharded and sharded input collections. 支持非分片和分片输入集合。
Can be used for incremental aggregation over large collections. 可用于大型集合的增量聚合。
MongoDB 2.2 implements much better support for sharded map reduce output . MongoDB 2.2实现了对分片地图缩减输出的更好支持。
Current Limitations 目前的限制
- A single emit can only hold half of MongoDB's maximum BSON document size (16MB). 单个发射只能容纳MongoDB的最大BSON文档大小的一半（16MB）。
- There is a JavaScript lock so a mongod server can only execute one JavaScript function at a point in time .. however, most steps of the MapReduce are very short so locks can be yielded frequently. 有一个JavaScript锁，所以mongod服务器只能在某个时间点执行一个JavaScript函数。但是，MapReduce的大多数步骤都很短，因此可以经常产生锁。
- MapReduce functions can be difficult to debug. MapReduce函数可能很难调试。 You can use print() and printjson() to include diagnostic output in the mongod log. 您可以使用print()和printjson()在mongod日志中包含诊断输出。
- MapReduce is generally not intuitive for programmers trying to translate relational query aggregation experience. 对于试图翻译关系查询聚合体验的程序员来说，MapReduce通常不直观。
See also: Map/Reduce examples . 另请参见： Map / Reduce示例 。

Aggregation Framework 聚合框架

New feature in the MongoDB 2.2.0 production release (August, 2012). MongoDB 2.2.0产品发布中的新功能（2012年8月）。
Designed with specific goals of improving performance and usability. 旨在提高性能和可用性的具体目标。
Returns result set inline. 返回内联结果集。
Supports non-sharded and sharded input collections. 支持非分片和分片输入集合。
Uses a "pipeline" approach where objects are transformed as they pass through a series of pipeline operators such as matching, projecting, sorting, and grouping. 使用“管道”方法，在对象通过一系列管道操作符（如匹配，投影，排序和分组）时进行转换。
Pipeline operators need not produce one output document for every input document: operators may also generate new documents or filter out documents. 管道运营商不需要为每个输入文档生成一个输出文档：运营商也可以生成新文档或过滤掉文档。
Using projections you can add computed fields, create new virtual sub-objects, and extract sub-fields into the top-level of results. 使用投影，您可以添加计算字段，创建新的虚拟子对象，并将子字段提取到顶级结果中。
Pipeline operators can be repeated as needed (for example, multiple $project or $group steps. 可以根据需要重复管道运算符（例如，多个$project或$group步骤。
Current Limitations 目前的限制
- Results are returned inline, so are limited to the maximum document size supported by the server (16MB) 结果以内联方式返回，因此仅限于服务器支持的最大文档大小（16MB）
- Doesn't support as many output options as MapReduce 不支持与MapReduce一样多的输出选项
- Limited to operators and expressions supported by the Aggregation Framework (ie can't write custom functions) 仅限于聚合框架支持的运算符和表达式（即无法编写自定义函数）
- Newest server feature for aggregation, so has more room to mature in terms of documentation, feature set, and usage. 用于聚合的最新服务器功能，因此在文档，功能集和使用方面有更多成熟空间。
See also: Aggregation Framework examples . 另请参见： 聚合框架示例 。

Can someone present an illustration or guide me to a link where these three concepts are explained together, taking the same sample data, so I can compare them easily? 有人可以提供插图或指导我一起解释这三个概念的链接，采用相同的样本数据，以便我可以轻松地比较它们吗？

You generally won't find examples where it would be useful to compare all three approaches, but here are previous StackOverflow questions which show variations: 您通常不会找到比较所有三种方法都有用的示例，但以前的StackOverflow问题显示了变化：