简体   繁体   中英

How to read data in real time from mongoDB using python

I want to read the data (documents from one collection) in real time or near real time from mongoDB and convert it into pandas data frame for further analysis.

I know how to fetch the data from mongoDB to python but, I want to keep the connection open so that whenever new data comes in, I will have it in Python for real time analysis.

client = MongoClient('localhost', 27017)
db = client.test_insert
collection = db.dataset

df = pd.DataFrame(list(db.dataset.find().limit(1)))

Please help :)

You can convert your collection into a capped collection so tailable cursors are available for it. But be aware of the other implications this has (fixed size in bytes for the whole collection, when the size is exceeded older documents get deleted, updates which increase document size are not possible).

When you don't want to make your collection capped, then you can alternatively create a tailable cursor on the oplog collection . That way your application will receive constant updates of all changes on the replica-set. You just need to filter out updates to those collections you aren't interested in.

If your query is on an indexed field, you should not use tailable cursors, but use a regular cursor. You can keep track of the last value of the indexed field either in Python or more resiliently in MongoDB. You can then use an infinite while loop to query for any new records:

db.<collection>.find( { indexedField: { $gt: <lastvalue> } } )

Depending on the complexity of your data frame analysis, you may want to investigate if you should add a work queue like RabbitMQ . This design will allow for one process pushing new records from MongoDB onto the message queue and multiple processes processing incoming messages on that queue.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM