简体   繁体   English

在mongodb上汇总大量数据

[英]Summing large amounts of data on mongodb

Im looking for the most efficient way of performing summing queries against mongodb. 我正在寻找对mongodb执行汇总查询的最有效方法。

Currently we insert documents that contain various information and a date time stamp of when the document was created. 当前,我们插入包含各种信息的文档以及创建文档的日期时间戳。

We need to sum this data to be viewed in the following ways: 我们需要对这些数据进行汇总,以便通过以下方式进行查看:

Documents by hour of the day 1-24 Documents by day of the month 1-28/31 Documents by month of the year 1-12 Documents by year 按小时的文件数1-24按月的文件数1-28 / 31按年的文件数1-12按年的文件数

This summed data will be accessed often as we're afraid that the massive amount of data thrown at mongo will have problems summing this data often. 由于我们担心在mongo上抛出的大量数据会经常会累加这些数据,因此经常会访问这些汇总的数据。

We thought perhaps when a document is inserted into mongo that we have another document that contains these counts that we increment at the time of insertion. 我们认为,也许将文档插入mongo时会发现另一个文档包含这些计数,这些计数在插入时会增加。 This way, we can quickly pull the counts without summing the data each request. 这样,我们可以快速获取计数,而无需对每个请求的数据求和。 Our concern is that this may not be the most efficient way to perform this type of operation in mongo 我们担心的是,这可能不是在mongo中执行此类操作的最有效方法

Any thoughts on the best way to accomplish this? 对实现此目标的最佳方法有何想法? My dev team as well as myself are new to mongodb and we want to make sure we don't fall into a performance trap with summing large sets of data. 我的开发团队以及我本人都是mongodb的新手,我们希望确保不会因汇总大量数据而陷入性能陷阱。

The Aggregation Framework is perfectly suited for this type of queries. 聚合框架非常适合此类查询。
I've done some examples for you below. 我在下面为您做了一些例子。

To start, let's populate some documents: 首先,让我们填充一些文档:

db.myDocumentCollection.insert({"date" : new Date('01/01/2012'), "topic" : "My Title 1"}); db.myDocumentCollection.insert({“ date”:新的Date('01 / 01/2012'),“ topic”:“我的标题1”})); db.myDocumentCollection.insert({"date" : new Date('01/02/2012'), "topic" : "My Title 2"}); db.myDocumentCollection.insert({“ date”:新的Date('01 / 02/2012'),“ topic”:“我的标题2”})); db.myDocumentCollection.insert({"date" : new Date('01/02/2012'), "topic" : "My Title 3"}); db.myDocumentCollection.insert({“ date”:新的Date('01 / 02/2012'),“ topic”:“我的标题3”})); db.myDocumentCollection.insert({"date" : new Date('01/02/2012'), "topic" : "My Title 4"}); db.myDocumentCollection.insert({“ date”:新的Date('01 / 02/2012'),“ topic”:“我的标题4”})); db.myDocumentCollection.insert({"date" : new Date('01/04/2012'), "topic" : "My Title 5"}); db.myDocumentCollection.insert({“ date”:新的Date('01 / 04/2012'),“ topic”:“我的标题5”})); db.myDocumentCollection.insert({"date" : new Date('01/05/2012'), "topic" : "My Title 6"}); db.myDocumentCollection.insert({“ date”:新的Date('01 / 05/2012'),“ topic”:“我的标题6”})); db.myDocumentCollection.insert({"date" : new Date('01/07/2013'), "topic" : "My Title 7"}); db.myDocumentCollection.insert({“ date”:新的Date('01 / 07/2013'),“ topic”:“我的标题7”})); db.myDocumentCollection.insert({"date" : new Date('01/07/2013'), "topic" : "My Title 8"}); db.myDocumentCollection.insert({“ date”:新的Date('01 / 07/2013'),“ topic”:“我的标题8”})); db.myDocumentCollection.insert({"date" : new Date('02/07/2013'), "topic" : "My Title 9"}); db.myDocumentCollection.insert({“ date”:新的Date('02 / 07/2013'),“ topic”:“我的标题9”})); db.myDocumentCollection.insert({"date" : new Date('02/08/2013'), "topic" : "My Title 10"}); db.myDocumentCollection.insert({“ date”:新的Date('02 / 08/2013'),“ topic”:“我的标题10”}));

Return number of documents grouped by full date 返回按完整日期分组的文档数

db.myDocumentCollection.group(
{
   $keyf : function(doc) {
       return { "date" : doc.date.getDate()+"/"+doc.date.getMonth()+"/"+doc.date.getFullYear() };
    },
    initial: {count:0},
    reduce: function(obj, prev) { prev.count++; }
 })

Output 输出量

[
        {
                "date" : "1/0/2012",
                "count" : 1
        },
        {
                "date" : "2/0/2012",
                "count" : 3
        },
        {
                "date" : "4/0/2012",
                "count" : 1
        },
        {
                "date" : "5/0/2012",
                "count" : 1
        },
        {
                "date" : "7/0/2013",
                "count" : 2
        },
        {
                "date" : "7/1/2013",
                "count" : 1
        },
        {
                "date" : "8/1/2013",
                "count" : 1
        }
]

Return number of documents grouped by day of month for the year 2013 返回2013年按月日分组的文档数

This is perhaps a little more relevant for the kinds of queries you want to do. 这可能与您要执行的查询类型有关。
Here, we use the cond to specify only to group documents after 1/1/2013 在这里,我们使用cond指定仅将2013年1月1日之后的文档分组
You could use $gte and $lte to do date ranges here. 您可以在此处使用$gte$lte来执行日期范围。

db.myDocumentCollection.group(
{
   $keyf : function(doc) {
       return { "date" : doc.date.getDate()+"/"+doc.date.getMonth()};
    },
    cond: {"date" : {"$gte": new Date('01/01/2013')}},
    initial: {count:0},
    reduce: function(obj, prev) { prev.count++; }
 })

Output 输出量

[
        {
                "date" : "7/0",
                "count" : 2
        },
        {
                "date" : "7/1",
                "count" : 1
        },
        {
                "date" : "8/1",
                "count" : 1
        }
]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM