简体繁体 English

Mongodb大数组或查询

[英]Mongodb large array or query

原文 2016-03-05 17:18:46 4 2 arrays/ mongodb

My question is related to mongo's ability to handle huge arrays. 我的问题与mongo处理大型数组的能力有关。

I would like to send push notification when topic is updated to all subscribers of the topic. 我想在主题更新到主题的所有订阅者时发送推送通知。 Assume a topic can have a million subscribers. 假设一个主题可以有100万订阅者。

Will it be efficient to hold a huge array in the topic document that holds all users ids that subscribed to it? 在主题文档中保留一个庞大的数组，该数组包含所有已预订的用户ID，效率高吗？ Or is the conservative way is better - hold an array of subscribed topics for each user and then query the users collection to find subscribers for specific topic? 还是保守的方法更好？是为每个用户保留一系列订阅的主题，然后查询用户集合以查找特定主题的订阅者？

Edit: 编辑：

I would hold an array of subscribed topics in the user collection anyway (for views and edits) 无论如何，我都会在用户集中保存一组订阅的主题（用于视图和编辑）

2 个解决方案

If your array is very big and cumulative size of document is exceeding 16 MB, then split it into another collection. 如果阵列很大，并且文档的累积大小超过16 MB，则将其拆分为另一个集合。 You can have topic in collection and all of its subscribers into separate collection referencing topic collection. 您可以将主题包含在集合中，并将其所有订阅者放入引用主题集合的单独集合中。

Primary Assumption: Topic-related and person-related metadata is stored in different collections and the collection being discussed here is utilized only to keep track of topic subscribers. 主要假设：与主题相关和与人相关的元数据存储在不同的集合中，此处讨论的集合仅用于跟踪主题订户。

Storing subscribers as a list/array associated with a topic identifier as the document key (meaning an indexed field) makes for an efficient structure. 将订户存储为与主题标识符（作为文档关键字，即索引字段）相关联的列表/数组有助于实现高效的结构。 Once you have a topic of interest you can lookup the subscriber list by topic identifier. 有了感兴趣的主题后，您可以按主题标识符查找订户列表。 Here, as @Saleem rightly pointed out, you need to be wary of large subscriber lists causing documents to exceed the 16MB documents size limit. 在这里，正如@Saleem正确指出的那样，您需要警惕大量的订户列表，这会导致文档超过16MB的文档大小限制。 But, instead of complicating the design by making a different collection to handle this (as suggested by @Saleem), you can simply split the subscriber list (into as many parts as required, using a modulo 16MB operation) and create multiple documents for a topic in the same collection. 但是，除了通过制作一个不同的集合来处理（如@Saleem所建议的那样）而使设计变得复杂之外，您还可以简单地将订户列表（使用模数16MB的操作分成所需的多个部分）并为同一集合中的主题。 Given that the topic identifier is an indexed field, lookup time will not be hurt, since 16MB can accomodate a significantly huge number of subscriber identifiers and number of splits required should be fairly low, if needed at all. 鉴于主题标识符是一个索引字段，查找时间将不会受到影响，因为16MB可以容纳大量的用户标识符，并且如果需要的话，所需的拆分数应该非常低。

The other structure you suggested, where a subscriber identifier is the document key with all their subscribed topics in the document is intuitively not so efficient for a large dataset. 您建议的另一种结构，即订户标识符是文档密钥及其在文档中所有已订阅主题的文档密钥，对于大型数据集而言，效率并不高。 This structure would involve lookup of all subscribers subscribing to the topic at hand. 这种结构将涉及查找所有订阅当前主题的订户。 If subscribed topics are stored as a list/array (seems the likely choice) this query would involve a $in clause which is slower than a indexed field lookup, even for small sized topic lists over a significantly large user base. 如果订阅的主题存储为列表/数组（似乎是可能的选择），则该查询将包含$in子句，该子句比索引字段查找要慢，即使对于用户群很大的小型主题列表也是如此。