简体   繁体   中英

MongoDb Streaming Out Inserted Data in Real-time (or near real-time)

I have a number of MongoDB collections which take a number of JSON documents from various streaming sources. In other-words there aa number of processes which are continually inserting data into a set of MongoDB collections.

I need a way to stream the data out of MongoDB into downstream applications. So I want an system that conceptually looks like this:

App Stream1 --> 
App Stream2 -->     MONGODB     --->  Aggregated Stream
App Stream3 -->

OR this:

App Stream1 -->                 --->  MongoD Stream1
App Stream2 -->     MONGODB     --->  MongoD Stream2
App Stream3 -->                 --->  MongoD Stream3

The question is how do I stream data out of Mongo without having to continually poll/query the database?

The obvious question answer would be "why dont you change those app streaming processes to send messages to a Queue like Rabbit, Zero or ActiveMQ which then has them send to your Mongo Streaming processes and Mongo at once like this":

                 MONGODB
                   /|\  
                    |
App Stream1 -->     |          --->  MongoD Stream1
App Stream2 -->  SomeMQqueue   --->  MongoD Stream2
App Stream3 -->                --->  MongoD Stream3

In an ideal world yes that would be good, but we need Mongo to ensure that messages are saved first, to avoid duplicates and ensure that IDs are all generated etc. Mongo has to sit in the middle as the persistence layer.

So how do I stream messages out of a Mongo collection (not using GridFS etc) into these down stream apps. The basic school of thought has been to just poll for new documents and each document that is collected update it by adding another field to the JSON documents stored in the database, much like a process flag in a SQL table that stores a processed time stamp. Ie every 1 second poll for documents where processed == null.... add processed = now().... update document.

Is there a neater/more computationally efficient method?

FYI - These are all Java processes.

Cheers!

If you are writing to a capped collection (or collections), you can use a tailablecursor to push new data on the stream, or on a message queue from where it can be streamed out. However this will not work for a non-capped collection though.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM