簡體   English   中英

如何加快聚合查詢?

[英]How to speed up aggregation queries?

以下是聚合查詢:

[
  {
    "$match": {
      "UserId": {
        "$in": [
          5
        ]
      },
      "WorkflowStartTime": {
        "$gte": ISODate('2015-04-09T00:00:00.000Z'),
        "$lte": ISODate('2015-04-16T00:00:00.000Z')
      }
    }
  },
  {
    "$group": {
      "_id": {
        "Task": "$TaskId",
        "WorkflowId": "$WorkflowInstanceId"
      },
      "TaskName": {
        "$first": "$Task"
      },
      "StartTime": {
        "$first": "$StartTime"
      },
      "EndTime": {
        "$last": "$EndTime"
      },
      "LastExecutionTime": {
        "$last": "$StartTime"
      },
      "WorkflowName": {
        "$first": "$WorkflowName"
      }
    }
  },
  {
    "$project": {
      "_id": 1,
      "LastExecutionTime": 1,
      "TaskName": 1,
      "AverageExecutionTime": {
        "$subtract": [
          "$EndTime",
          "$StartTime"
        ]
      },
      "WorkflowName": 1
    }
  },
  {
    "$group": {
      "_id": "$_id.Task",
      "LastExecutionTime": {
        "$last": "$LastExecutionTime"
      },
      "AverageExecutionTime": {
        "$avg": "$AverageExecutionTime"
      },
      "TaskName": {
        "$first": "$TaskName"
      },
      "TotalInstanceCount": {
        "$sum": 1
      },
      "WorkflowName": {
        "$first": "$WorkflowName"
      }
    }
  },
  {
    "$project": {
      "Id": "$_id",
      "_id": 0,
      "Name": "$TaskName",
      "LastExecutionDate": {
        "$substr": [
          "$LastExecutionTime",
          0,
          30
        ]
      },
      "AverageExecutionTimeInMilliSeconds": "$AverageExecutionTime",
      "TotalInstanceCount": "$TotalInstanceCount",
      "WorkflowName": 1
    }
  }
]

我的收集文件如下:

{
        "_id" : ObjectId("550ff07ce4b09bf056df4ac1"),
        "OutputData" : "xyz",
        "InputData" : null,
        "Location" : null,
        "ChannelName" : "XYZ",
        "UserId" : 5,
        "TaskId" : 95,
        "ChannelId" : 5,
        "Status" : "Success",
        "TaskTypeId" : 7,
        "WorkflowId" : 37,
        "Task" : "XYZ",
        "WorkflowStartTime" : ISODate("2015-03-23T05:09:26Z"),
        "EndTime" : ISODate("2015-03-23T05:22:44Z"),
        "StartTime" : ISODate("2015-03-23T05:22:44Z"),
        "TaskType" : "TRIGGER",
        "WorkflowInstanceId" : "23-3-2015-95d17f17-2580-4fe3-b627-12e862af08ce",
        "StackTrace" : null,
        "WorkflowName" : "XYZ data workflow"
}

我有{WorkflowStartTime:1,UserId:1,StartTime:1}的索引

他們在收集中幾乎沒有900000個記錄,因為我在使用日期范圍查詢時使用的數據子集仍然需要大約1.5到1.7秒。 我已經使用聚合框架和其他具有大量數據的集合,並且性能非常好。 不知道這個查詢有什么問題,因為它顯示了非常慢的輸出,我希望它作為一個實時分析查詢在工廠中。 任何指針都贊賞它。

{explain:true}添加到聚合查詢時的輸出

{
  "stages": [


       {
          "$cursor": {
            "query": {
              "UserId": {
                "$in": [
                  5
                ]
              },
              "WorkflowStartTime": {
                "$gte": "ISODate(2015-04-09T00:00:00Z)",
                "$lte": "ISODate(2015-04-16T00:00:00Z)"
              }
            },
            "fields": {
              "EndTime": 1,
              "StartTime": 1,
              "Task": 1,
              "TaskId": 1,
              "WorkflowInstanceId": 1,
              "WorkflowName": 1,
              "_id": 0
            },
            "plan": {
              "cursor": "BtreeCursor ",
              "isMultiKey": false,
              "scanAndOrder": false,
              "indexBounds": {
                "WorkflowStartTime": [
                  [
                    "ISODate(2015-04-16T00:00:00Z)",
                    "ISODate(2015-04-09T00:00:00Z)"
                  ]
                ],
                "UserId": [
                  [
                    5,
                    5
                  ]
                ]
              },
              "allPlans": [
                {
                  "cursor": "BtreeCursor ",
                  "isMultiKey": false,
                  "scanAndOrder": false,
                  "indexBounds": {
                    "WorkflowStartTime": [
                      [
                        "ISODate(2015-04-16T00:00:00Z)",
                        "ISODate(2015-04-09T00:00:00Z)"
                      ]
                    ],
                    "UserId": [
                      [
                        5,
                        5
                      ]
                    ]
                  }
                }
              ]
            }
          }
        },
        {
          "$group": {
            "_id": {
              "Task": "$TaskId",
              "WorkflowId": "$WorkflowInstanceId"
            },
            "TaskName": {
              "$first": "$Task"
            },
            "StartTime": {
              "$first": "$StartTime"
            },
            "EndTime": {
              "$last": "$EndTime"
            },
            "LastExecutionTime": {
              "$last": "$StartTime"
            },
            "WorkflowName": {
              "$first": "$WorkflowName"
            }
          }
        },
        {
          "$project": {
            "_id": true,
            "LastExecutionTime": true,
            "TaskName": true,
            "AverageExecutionTime": {
              "$subtract": [
                "$EndTime",
                "$StartTime"
              ]
            },
            "WorkflowName": true
          }
        },
        {
          "$group": {
            "_id": "$_id.Task",
            "LastExecutionTime": {
              "$last": "$LastExecutionTime"
            },
            "AverageExecutionTime": {
              "$avg": "$AverageExecutionTime"
            },
            "TaskName": {
              "$first": "$TaskName"
            },
            "TotalInstanceCount": {
              "$sum": {
                "$const": 1
              }
            },
            "WorkflowName": {
              "$first": "$WorkflowName"
            }
          }
        },
        {
          "$project": {
            "_id": false,
            "Id": "$_id",
            "Name": "$TaskName",
            "LastExecutionDate": {
              "$substr": [
                "$LastExecutionTime",
                {
                  "$const": 0
                },
                {
                  "$const": 30
                }
              ]
            },
            "AverageExecutionTimeInMilliSeconds": "$AverageExecutionTime",
            "TotalInstanceCount": "$TotalInstanceCount",
            "WorkflowName": true
          }
        }
      ],
      "ok": 1
    }

聚合不使用任何索引。 您需要創建一個新索引:

{UserId:1,WorkflowStartTime:1}

如果一切都好,那么agregation + explain必須出現在這一行:

    "winningPlan" :...

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM