简体   繁体   中英

MongoDB - Distinct, Limit, and Sort for better results

I'm trying to develop a query to help mix up results in a search request in MongoDB. An example (and very simplified version) of my collection looks like this. Each document has a location to query, a ranking on the quality of the listing, and the name of a provider who inserted the listing.

[
  {
    "location": "paris",
    "ranking": "998",
    "provider": "Alpha"
  },
  {
    "location": "paris",
    "ranking": "965",
    "provider": "Alpha"
  },
  {
    "location": "paris",
    "ranking": "945",
    "provider": "Alpha"
  },
  {
    "location": "paris",
    "ranking": "933",
    "provider": "Alpha"
  },
  {
    "location": "paris",
    "ranking": "953",
    "provider": "Alpha"
  },
  {
    "location": "paris",
    "ranking": "983",
    "provider": "Alpha"
  },
  {
    "location": "paris",
    "ranking": "700",
    "provider": "Beta"
  },
  {
    "location": "paris",
    "ranking": "745",
    "provider": "Beta"
  },
  {
    "location": "paris",
    "ranking": "670",
    "provider": "Omega"
  },
  {
    "location": "paris",
    "ranking": "885",
    "provider": "Omega"
  },
  {
    "location": "paris",
    "ranking": "500",
    "provider": "Omega"
  },
  {
    "location": "london",
    "ranking": "600",
    "provider": "Omega"
  },
  {
    "location": "london",
    "ranking": "650",
    "provider": "Beta"
  }
]

As you can see, provider Alpha has the most listings, and the best rankings. So when I search paris and sort by ranking, all the listings from the Alpha provider get put on top, and the Beta's and Omega's shoved off to the bottom.

What I'd like to do is limit each provider to 3. So that even though Alphas will still be on top, they'll be limited to 3 allowing for the Betas and Omegas to be higher up. And then the remaining Alphas can be seen on "page 2" when .skip is used.

If I was to do this in Python, a synchronous example would look like this.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

results = []

providersAvailable = colc.find({'location': 'paris'}).distinct('provider')
for provider in providersAvailable:
    search = colc.find({'provider':provider, 'location': 'paris'}).limit(3)
    results = results + list(search)

return sorted(results, key=lambda k: k['ranking']) 

This is heavy, time consuming, and overall just sucks, espicially with a collection of 2.5 million documents. How could I do this all on Mongos side? Thanks!

You could try some server side JS eg.

var providers = db.runCommand({distinct:"colc", key:"provider"}).values
for(p in providers){
   var c = db.colc.find({"provider":providers[p]}).sort({"ranking":-1}).limit(3);
   c.forEach(printjson);
}

but as all JS is interpreted it's not going to be the fastest option.

You could play with the aggregation framework, which will be mainly a server side hit eg.

db.colc.aggregate([ 
    {$match: {"location":"paris"}}, 
    {$group:{_id: { "provider": "$provider", "location":"$location"}, 
             "rankings" : { $addToSet: "$ranking"} } } 
]);

But you'll need a bit of client side code to pick out the rankings for each provider, from the return Array.

{
    "result" : [
        {
            "_id" : {
                "provider" : "Omega",
                "location" : "paris"
            },
            "rankings" : [
                "500",
                "885",
                "670"
            ]
        },
        {
            "_id" : {
                "provider" : "Beta",
                "location" : "paris"
            },
            "rankings" : [
                "745",
                "700"
            ]
        },
        {
            "_id" : {
                "provider" : "Alpha",
                "location" : "paris"
            },
            "rankings" : [
                "983",
                "953",
                "933",
                "945",
                "965",
                "998"
            ]
        }
    ],
    "ok" : 1
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM