Difficulty with document batch import, pymongo

Question

I'm having a much more difficult time than I thought I would importing multiple documents from Mongo into RAM in batch. I am writing an application to communicate with a MongoDB via pymongo that currently has 2GBs, but in the near future could grow to over 1TB. Because of this, batch reading a limited number of records into RAM at a time is important for scalability.

Based on this post and this documentation I thought this would be about as easy as:

HOST = MongoClient(MONGO_CONN)
DB_CONN = HOST.database_name
collection = DB_CONN.collection_name
cursor = collection.find()
cursor.batch_size(1000) 
next_1K_records_in_RAM = cursor.next()

This isn't working for me, however. Even though I have a Mongo collection populated with >200K BSON objects, this reads them in one at a time as single dictionaries, eg {_id : ID1, ...} instead of what I'm looking for, which is an error of dictionaries representing multiple documents in my collections, eg [{_id : ID1, ...}, {_id : ID2, ...}, ..., {_id: ID1000, ...}] .

I wouldn't expect this to matter, but I'm on python 3.5 instead of 2.7.

As this example references a secure, remote data source this isn't a reproducible example. Apologies for that. If you have a suggestion for how the question can be improved please let me know.

Answer 1

Python version is irrelevant here, nothing to do with your output.
Batch_size defines only how many documents mongoDB returns in a single trip to DB (under some limitations: see here here )
collection.find always returns an iterator/cursor or None. Batching does its job transparently) (the later if no documents are found)
To examine returned documents you have to iterate through the cursor ie
For document in cursor: print (document)

or if you want a list of the documents: list(cursor)
- Remember to do a cursor.rewind() if you need to revisit the documents

Difficulty with document batch import, pymongo

Question

1 answers

solution1
1 ACCPTED 2016-09-26 23:41:27

Difficulty with document batch import, pymongo

Question

1 answers

solution1 1 ACCPTED 2016-09-26 23:41:27

solution1
1 ACCPTED 2016-09-26 23:41:27