简体   繁体   English

在MongoDB中进行查询和排序以实现多对多关系

[英]Query and Sort in MongoDB for a many-to-many relationship

Suppose I have a relationship between users , posts , likes . 假设我在userspostslikes之间有关系。 A user can like a post and a post can be liked by many users. 一个用户可以喜欢一个帖子,而一个帖子可以被许多用户喜欢。

My goal is to be able to design a db structure in MongoDB such that I can quickly query for all the posts a user has liked AND sort/filter them in the multiple ways listed below (not at the same time - think a dropdown that lets you change the sort order of your search results) 我的目标是能够在MongoDB中设计一个数据库结构,以便我可以快速查询用户喜欢的所有帖子,并以下面列出的多种方式对它们进行排序/过滤(不能同时使用-考虑一个下拉菜单,您可以更改搜索结果的排序顺序)

  1. Order in which posts were liked 喜欢帖子的顺序
  2. Filter and order by various post attributes - such as title, number of post responses, when the post was created, etc 按各种post属性进行过滤和排序-例如标题,帖子回复数,帖子创建时间等

Suppose the number of posts is in the order of 100,000 and each post will have on the order of 100-1000 likes 假设帖子的数量在100,000个左右,每个帖子的赞数在100-1000个左右

Possible solutions I've thought of: 我想到的可能的解决方案:

1) likes are embedded in posts . 1) likes被嵌入posts

This allows #2 to be dealt with easily because you just have an index over likes.user_id and over whatever other post attributes you need. 这使#2可以轻松处理,因为您只是在likes.user_id以及所需的其他任何post属性上都有一个索引。 This is also fast, because you only need to run one query. 这也很快,因为您只需要运行一个查询。

However, this makes it impossible to sort by when a user liked something (AFAIK). 但是,这使得无法按用户喜欢的时间进行分类(AFAIK)。

2) likes are a separate collection with attributes post_id , account_id . 2) likes是具有属性post_idaccount_id的单独集合。

This allows #1 to be dealt with easily since you can just sort by _id. 由于您可以按_id排序,因此可以轻松处理#1。 However, unless you duplicate & cache post attributes into the like document, it becomes impossible to handle #2. 但是,除非您将post属性复制并缓存到like文档中,否则将无法处理#2。 This is possible but really not ideal. 这是可能的,但实际上并不理想。 Additionally, this is slower to query. 此外,这查询起来较慢。 You'd need to run two queries - one to query the like collection, then a post query using $in: [post_ids]. 您需要运行两个查询-一个查询like集合,然后使用$ in进行post查询:[post_ids]。

Are there any other solutions/designs I should consider? 我还应该考虑其他解决方案/设计吗? Am I missing anything in these proposed solutions? 我在这些建议的解决方案中缺少任何内容吗?

I would use a denormalized version of #2. 我将使用#2的非规范化版本。 Have a like document: 有一个like文件:

{
    "_id" : ObjectId(...),
    "account_id" : 1234,
    "post_id" : 4321,
    "ts" : ISODate(...),
    // additional info about post needed for basic display
    "post_title" : "The 10 Worst-Kept Secrets of Cheesemongers"
    // etc.
}

With an index on { "account_id" : 1, "ts" : 1 } , you can efficiently find like documents for a specific user ordered by like time. 随着指数{ "account_id" : 1, "ts" : 1 }则可以有效地找到like由像时间排序特定用户的文档。

db.likes.find({ "account_id" : 1234 }).sort({ "ts" : -1 })

If you put the basic info about the post into the like document, you don't need to retrieve the post document until, say, the user clicks on a link to be shown the entire post. 如果将有关帖子的基本信息放入like文档中,则无需检索帖子文档,直到用户单击链接以显示整个帖子。

The tradeoff is that, if some like -embedded information about a post changes, it needs to be changed in every like . 折衷是,如果某个帖子的某些like嵌入的信息发生更改,则需要在每个“ like进行更改。 This could be nothing or it could be cumbersome, depending on what you choose to embed and how often posts are modified after they have a lot of likes. 这可能什么都不是,也可能很麻烦,这取决于您选择嵌入的内容以及帖子在收到很多喜欢后被修改的频率。

Your first option seems quite good to me. 您的第一选择对我来说似乎很不错。 It deals with both of your requirements nicely. 它很好地满足了您的两个需求。 as, 如,

  1. You need to sort the comments, posts based on attributes of post,comment which is possible to through aggregations 您需要根据发布,评论的属性对评论,发布进行排序,这可以通过聚合实现
  2. You need to filter the documents(posts) based on some attributes which is also possible. 您需要根据某些属性来过滤文档(帖子),这也是可能的。

Disadvantage of 2 collections are you need to run 2 queries for getting a piece of data. 2个集合的缺点是您需要运行2个查询来获取一条数据。 NoSQL databases gives you flexibility to store related data at one place and provides best performance for the same. NoSQL数据库使您可以灵活地将相关数据存储在一个位置,并提供最佳的性能。 By not using benefits of NoSQL you wont achieve optimized performance. 不使用NoSQL的好处,您将无法获得优化的性能。

Do not think from RDBMS perspective (forget normalization). 不要从RDBMS角度考虑(忘记标准化)。 If you need more performance optimization with first option go with indexing, sharding (with shard key as alphabets range, geography etc.) 如果您需要使用第一个选项进行更多性能优化,则可以使用索引,分片(使用分片键作为字母范围,地理位置等)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM