简体   繁体   中英

Cosmos DB Aggregation Pipeline incompatible (Mongo API)?

Perhaps somebody has a good idea how we can fix or workaround something which looks like a bug in the current implementation of Aggregation Pipelines for MongoDB in Azure Cosmos DB (so yes: We have switched on the feature on our instances).

Short version is: It seems to us that a $match aggregation stage after a $group stage does not work. It never returns any results.

This can be reproduced in the following way using any Mongo Console, assuming you are already in a valid DB (using use <some db> ):

Paste the following JavaScript (this line is optional if you created the collection via the Azure Portal; it assumes your collection is called bug )

db.createCollection("bug");

The add some documents to the collection:

db.bug.insert({ _id: 1, owner: "a", _class: "History" });
db.bug.insert({ _id: 2, owner: "a", _class: "History" });
db.bug.insert({ _id: 3, owner: "a", _class: "DocumentBookmark" });
db.bug.insert({ _id: 4, owner: "a", _class: "Recyclebin" });
db.bug.insert({ _id: 5, owner: "b", _class: "History" });

As you can see, the owner: "a" has a duplicate History record, which we want to query for.

Now do the following:

db.bug.aggregate([
    { $match: { _class: "History"} }
]);

This renders the correct result:

globaldb:PRIMARY> db.bug.aggregate([
...     { $match: { _class: "History"} }
... ]);
{
    "_t" : "AggregationPipelineResponse",
    "ok" : 1,
    "waitedMS" : NumberLong(0),
    "result" : [
        {
            "_id" : 1,
            "owner" : "a",
            "_class" : "History"
        },
        {
            "_id" : 2,
            "owner" : "a",
            "_class" : "History"
        },
        {
            "_id" : 5,
            "owner" : "b",
            "_class" : "History"
        }
    ]
}

Now add a $group stage with a count to find the number of records per owner:

db.bug.aggregate([
    { $match: { _class: "History"} },
    { $group: { _id: "$owner", count: { $sum: 1 }}}
]);

This also returns a correct result:

globaldb:PRIMARY> db.bug.aggregate([
...     { $match: { _class: "History"} },
...     { $group: { _id: "$owner", count: { $sum: 1 }}}
... ]);
{
    "_t" : "AggregationPipelineResponse",
    "ok" : 1,
    "waitedMS" : NumberLong(0),
    "result" : [
        {
            "_id" : "a",
            "count" : NumberLong(2)
        },
        {
            "_id" : "b",
            "count" : NumberLong(1)
        }
    ]
}

Now we want to match for records with a count greater than 1:

db.bug.aggregate([
    { $match: { _class: "History"} },
    { $group: { _id: "$owner", count: { $sum: 1 }}},
    { $match: { count: { $gt: 1 }}}
]);

This returns an empty result set:

globaldb:PRIMARY> db.bug.aggregate([
...     { $match: { _class: "History"} },
...     { $group: { _id: "$owner", count: { $sum: 1 }}},
...     { $match: { count: { $gt: 1 }}}
... ]);
{
    "_t" : "AggregationPipelineResponse",
    "ok" : 1,
    "waitedMS" : NumberLong(0),
    "result" : [ ]
}

The same on Mongo DB

Now, to verify these queries are actually correct, I tried it using the mongo:3.4 docker image. The following code will start a new Mongo DB instance on your local machine, so that you can try it out yourself:

$ docker run --name mongobug -d mongo:3.4
ad3010da255b7c15a464fa21ff6519799a5c16cb8af62a0ea564a95780900491
$ docker exec -it mongobug mongo
MongoDB shell version v3.4.10
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.4.10
Welcome to the MongoDB shell.
>

Then we'll do the same as with Cosmos above; inside the Mongo Shell, run the following commands:

db.createCollection("bug")

Then insert the test data:

db.bug.insert({ _id: 1, owner: "a", _class: "History" });
db.bug.insert({ _id: 2, owner: "a", _class: "History" });
db.bug.insert({ _id: 3, owner: "a", _class: "DocumentBookmark" });
db.bug.insert({ _id: 4, owner: "a", _class: "Recyclebin" });
db.bug.insert({ _id: 5, owner: "b", _class: "History" });

And now you can see that the query which returned an empty set in fact returns a non-empty aggregate result when running the following aggregate query:

db.bug.aggregate([
    { $match: { _class: "History"} },
    { $group: { _id: "$owner", count: { $sum: 1 }}},
    { $match: { count: { $gt: 1 }}}
]);

The result is the expected:

> db.bug.aggregate([
...     { $match: { _class: "History"} },
...     { $group: { _id: "$owner", count: { $sum: 1 }}},
...     { $match: { count: { $gt: 1 }}}
... ]);
{ "_id" : "a", "count" : 2 }

Extra bonus reading

I also tried to first group by owner and _class in a joint group and then $match ; this is obviously by far a much more expensive operation, as Mongo has to group over the entire collection, and not only over already filtered items.

But, unfortunately, this also rendered an empty result, while it worked on the local Mongo docker image:

db.bug.aggregate([
    { $group: { _id: { owner: "$owner", _class: "$_class" }, count: { $sum: 1 } } },
    { $match: { "_id._class": "History", count: { $gt: 1 } } }
]);

Result on Cosmos:

globaldb:PRIMARY> db.bug.aggregate([
...     { $group: { _id: { owner: "$owner", _class: "$_class" }, count: { $sum: 1 } } },
...     { $match: { "_id._class": "History", count: { $gt: 1 } } }
... ]);
{
    "_t" : "AggregationPipelineResponse",
    "ok" : 1,
    "waitedMS" : NumberLong(0),
    "result" : [ ]
}

Result on Mongo DB:

> db.bug.aggregate([
...     { $group: { _id: { owner: "$owner", _class: "$_class" }, count: { $sum: 1 } } },
...     { $match: { "_id._class": "History", count: { $gt: 1 } } }
... ]);
{ "_id" : { "owner" : "a", "_class" : "History" }, "count" : 2 }

Weird.

tl;dr

Does Cosmos DB have a bug which does not allow for running $match aggregates after $group stages?

Wanted to provide an update to this thread since version 3.6 has been released of the Azure Cosmos DB for MongoDB API, the issue with aggregations is now returning the correct result.

Initial Query + Result:

db.coll_01.aggregate([
  { $match: { _class: "History"} }
    ]);

Operation consumed 3.18 RUs
{ "_id" : 1, "owner" : "a", "_class" : "History" }
{ "_id" : 2, "owner" : "a", "_class" : "History" }
{ "_id" : 5, "owner" : "b", "_class" : "History" }

Second Query + Result:

db.coll_01.aggregate([
 { $match: { _class: "History"} },
 { $group: { _id: "$owner", count: { $sum: 1 }}}
   ]);

Operation consumed 3.36 RUs
{ "_id" : "a", "count" : 2 }
{ "_id" : "b", "count" : 1 }

And finally the aggregation query + result:

db.coll_01.aggregate([
 { $match: { _class: "History"} },
 { $group: { _id: "$owner", count: { $sum: 1 }}},
 { $match: { count: { $gt: 1 }}}
  ]);

Operation consumed 3.36 RUs
{ "_id" : "a", "count" : 2 }

Please see for more information: Azure Cosmos DB's API for MongoDB (3.6 version): supported features and syntax

Your observations are correct. Multiple $match stages are not yet supported in Cosmos DB. $match has to be the first stage. One of the short-term workarounds until the support is implemented (besides an obvious - dealing with additional filtering on the client side) is to use $out stage and utilize a temporary collection, on which you can run another aggregation pipeline command with another $match.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM