I'm having a much more difficult time than I thought I would importing multiple documents from Mongo into RAM in batch. I am writing an application to communicate with a MongoDB via pymongo
that currently has 2GBs, but in the near future could grow to over 1TB. Because of this, batch reading a limited number of records into RAM at a time is important for scalability.
Based on this post and this documentation I thought this would be about as easy as:
HOST = MongoClient(MONGO_CONN)
DB_CONN = HOST.database_name
collection = DB_CONN.collection_name
cursor = collection.find()
cursor.batch_size(1000)
next_1K_records_in_RAM = cursor.next()
This isn't working for me, however. Even though I have a Mongo collection populated with >200K BSON objects, this reads them in one at a time as single dictionaries, eg {_id : ID1, ...}
instead of what I'm looking for, which is an error of dictionaries representing multiple documents in my collections, eg [{_id : ID1, ...}, {_id : ID2, ...}, ..., {_id: ID1000, ...}]
.
I wouldn't expect this to matter, but I'm on python 3.5 instead of 2.7.
As this example references a secure, remote data source this isn't a reproducible example. Apologies for that. If you have a suggestion for how the question can be improved please let me know.
To examine returned documents you have to iterate through the cursor ie
For document in cursor: print (document)
or if you want a list of the documents: list(cursor)
cursor.rewind()
if you need to revisit the documents
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.