简体   繁体   中英

Mongodb large array or query

My question is related to mongo's ability to handle huge arrays.

I would like to send push notification when topic is updated to all subscribers of the topic. Assume a topic can have a million subscribers.

Will it be efficient to hold a huge array in the topic document that holds all users ids that subscribed to it? Or is the conservative way is better - hold an array of subscribed topics for each user and then query the users collection to find subscribers for specific topic?

Edit:

I would hold an array of subscribed topics in the user collection anyway (for views and edits)

If your array is very big and cumulative size of document is exceeding 16 MB, then split it into another collection. You can have topic in collection and all of its subscribers into separate collection referencing topic collection.

Primary Assumption: Topic-related and person-related metadata is stored in different collections and the collection being discussed here is utilized only to keep track of topic subscribers.

Storing subscribers as a list/array associated with a topic identifier as the document key (meaning an indexed field) makes for an efficient structure. Once you have a topic of interest you can lookup the subscriber list by topic identifier. Here, as @Saleem rightly pointed out, you need to be wary of large subscriber lists causing documents to exceed the 16MB documents size limit. But, instead of complicating the design by making a different collection to handle this (as suggested by @Saleem), you can simply split the subscriber list (into as many parts as required, using a modulo 16MB operation) and create multiple documents for a topic in the same collection. Given that the topic identifier is an indexed field, lookup time will not be hurt, since 16MB can accomodate a significantly huge number of subscriber identifiers and number of splits required should be fairly low, if needed at all.

The other structure you suggested, where a subscriber identifier is the document key with all their subscribed topics in the document is intuitively not so efficient for a large dataset. This structure would involve lookup of all subscribers subscribing to the topic at hand. If subscribed topics are stored as a list/array (seems the likely choice) this query would involve a $in clause which is slower than a indexed field lookup, even for small sized topic lists over a significantly large user base.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM