简体   繁体   English

MongoDB聚合-查询性能

[英]MongoDB Aggregation - $lookup performance

I'm using MongoDB 3.6 aggregation with lookup in order to Join two collections (users and subscriptionusers). 我将MongoDB 3.6聚合与查找结合使用,以便加入两个集合(用户和subscriptionusers)。

var UserSchema = mongoose.Schema({
  email:{
    type: String,
    trim: true,
    unique: true,
  },
  name: {
    type: String,
    required: true,
    trim: true,
  },
  password: String,
  gender: { type: String, enum: ['male', 'female', 'unknown'], default: 'unknown'},
  age_range: { type: String, enum: [12, 16, 18], default: 18},
  country: {type:String, default:'co'}
});

var SuscriptionUsersSchema = mongoose.Schema({
  user_id: {
    ref: 'Users',
    type: mongoose.Schema.ObjectId
  },
  channel_id: {
    ref: 'Channels',
    type: mongoose.Schema.ObjectId
  },
  subscribed: {type: Boolean, default:false},
  unsubscribed_at: Date,
  subscribed_at: Date
});

My goal is to query into suscriptionusers and join users collection, matching a start and end date, in order to get some analytics of subscriptions, like country, age range and gender of users subscribed, and show the data in a line chart. 我的目标是查询suscriptionusers并加入用户集合,匹配开始日期和结束日期,以获取订阅的一些分析数据,例如国家/地区,年龄范围和所订阅用户的性别,并在折线图中显示数据。 I'm doing this way: 我这样做:

db.getCollection('suscriptionusers').aggregate([
{$match: {
    'channel_id': ObjectId('......'),
    'subscribed_at': {
            $gte: new Date('2018-01-01'),
            $lte: new Date('2019-01-01'),
    },
    'subscribed': true
}},     
{
    $lookup:{
        from: "users",      
        localField: "user_id", 
        foreignField: "_id",
        as: "users"        
    }
},
/*  Implementing this form instead the earlier (above), make the process even slower :(
 {$lookup:
 {
   from: "users",
   let: { user_id: "$user_id" },
   pipeline: [
      { $match:
          { $expr:
             {$eq: [ "$_id",  "$$user_id" ]}
          }
      },
      { $project: { age_range:1, country: 1, gender:1 } }
   ],
   as: "users"
 }
},*/
{$unwind: {
    path: "$users",
    preserveNullAndEmptyArrays: false
}},
{$project: {
    'users.age_range': 1, 
    'users.country': 1, 
    'users.gender': 1, 
    '_id': 1, 
    'subscribed_at': { $dateToString: { format: "%Y-%m", date: "$subscribed_at" } },
    'unsubscribed_at': { $dateToString: { format: "%Y-%m", date: "$unsubscribed_at" } }
}},
])

The main concern is about performance. 主要关注的是性能。 For example, for about 150.000 subscribers, the query is taking around 7~8 seconds to retrieve information, and I'm afraid of what could happen for million subscribers, due to even if I conditionate a limit for records (for example retrieve only data between two months), there is the possibility of hundreds of subscribers between that period. 例如,对于大约150.000个订阅者,查询需要大约7到8秒的时间来检索信息,而且即使我限制了记录的限制(例如仅检索数据),我也担心百万个订阅者会发生什么情况(两个月之间),则在此期间可能会有数百个订阅者。

I have already tried creating an index for subscriptionusers collection, for user_id field, however, there is not an improvement. 我已经尝试为user_id字段的subscriptionusers集合创建索引,但是并没有改善。

db.getCollection('suscriptionusers').ensureIndex({user_id: 1});

My question is, should I save the fields (country, age_range, and gender) also in subscriptionusers collection? 我的问题是,是否也应将字段(国家,年龄范围和性别)保存在subscriptionusers集合中? because if I query without the lookup for users collection, the process is fast enough. 因为如果我查询时不查找用户集合,则过程足够快。

Or is there a better way to improve the performance using my current scheme? 还是有一种更好的方法可以使用当前方案来提高性能?

Thank a lot :) 非常感谢 :)

Edit: Just to take into account, the user could be subscribed to multiple channels, and it's because of that, the subscription is not saved inside users collection 编辑:考虑到这一点,可以向用户订阅多个频道,因此,该订阅未保存在用户集合中

Well, maybe is not the best method, but I just included the fields needed from the UserSchema into the SuscriptionUsersSchema. 好吧,也许不是最好的方法,但是我只是将UserSchema所需的字段包括到SuscriptionUsersSchema中。 This is notably faster for the analytics purpose. 出于分析目的,这明显更快。 Also, I figured out that analytics record must be unchanged in the time, in order to keep the data as it was generated at the moment. 另外,我发现分析记录在当时必须保持不变,以便保留当前生成的数据。 So by using the data in this way, even if the user changes her/his information, or deletes the account, the data will remain unchanged. 因此,通过以这种方式使用数据,即使用户更改了他/他的信息或删除了帐户,数据也将保持不变。 If you have any advise, please feel free to share it :) 如果您有任何建议,请随时分享:)

Just for reference, my SuscriptionUsersSchema now looks like: 仅供参考,我的SuscriptionUsersSchema现在看起来像:

    var SuscriptionUsersSchema = mongoose.Schema({
  user_id: {
    ref: 'Users',
    type: mongoose.Schema.ObjectId
  },
  channel_id: {
    ref: 'Channels',
    type: mongoose.Schema.ObjectId
  },
  subscribed: {type: Boolean, default:false},
  gender: { type: String, enum: ['male', 'female', 'unknown'], default: 'unknown'},
  age_range: { type: String, enum: [12, 16, 18], default: 18},
  country: {type:String, default:'co'}
  unsubscribed_at: Date,
  subscribed_at: Date
});

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM