简体   繁体   中英

How can I use Mongodb Aggregation in this example?

I am currently using Python to build many of my results instead of MongoDB itself. I am trying to get my head around Aggregation, but I'm struggling a bit. Here is an example of what I am doing currently which perhaps could be better handled by MongoDB.

I have a collection of programs and a collection of episodes. Each program has a list of episodes (DBRefs) associated with it. (The episodes are stored in their own collection because both programs and episodes are quite complex and deep, so embedding is impractical). Each episode has a duration (float). If I want to find a program's average episode duration, I do this:

episodes = list(db.Episodes.find({'Program':DBRef('Programs',ObjectId(...))}))
durations = set(e['Duration'] for e in episodes if e['Duration'] > 0)
avg_mins = int(sum(durations) / len(durations) / 60

This is pretty slow when a program has over 1000 episodes. Is there a way I can do it in MongoDB?

Here is some sample data in Mongo shell format. There are three episodes belonging to the same program. How can I calculate the average episode duration for the program?

> db.Episodes.find({
    '_Program':DBRef('Programs',ObjectId('4ec634fbf4c4005664000313'))},
   {'_Program':1,'Duration':1}).limit(3)

{
    "_id" : ObjectId("506c15cbf4c4005f9c40f830"),
    "Duration" : 1643.856,
    "_Program" : DBRef("Programs", ObjectId("4ec634fbf4c4005664000313"))
}
{
    "_id" : ObjectId("506c15d3f4c4005f9c40f8cf"),
    "Duration" : 1598.088,
    "_Program" : DBRef("Programs", ObjectId("4ec634fbf4c4005664000313"))
}
{
    "_id" : ObjectId("506c15caf4c4005f9c40f80e"),
    "_Program" : DBRef("Programs", ObjectId("4ec634fbf4c4005664000313")),
    "Duration" : 1667.04
}

I figured it out, and it is ridiculously fast compared to pulling it all into Python.

p = db.Programs.find_one({'Title':'...'})

pipe = [
        {'$match':{'_Program':DBRef('Programs',p['_id']),'Duration':{'$gt':0}}},
        {'$group':{'_id':'$_Program', 'AverageDuration':{'$avg':'$Duration'}}}
        ]

eps = db.Episodes.aggregate(pipeline=pipe)

print eps['result']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM