简体   繁体   中英

Difficulty with document batch import, pymongo

I'm having a much more difficult time than I thought I would importing multiple documents from Mongo into RAM in batch. I am writing an application to communicate with a MongoDB via pymongo that currently has 2GBs, but in the near future could grow to over 1TB. Because of this, batch reading a limited number of records into RAM at a time is important for scalability.

Based on this post and this documentation I thought this would be about as easy as:

HOST = MongoClient(MONGO_CONN)
DB_CONN = HOST.database_name
collection = DB_CONN.collection_name
cursor = collection.find()
cursor.batch_size(1000) 
next_1K_records_in_RAM = cursor.next()

This isn't working for me, however. Even though I have a Mongo collection populated with >200K BSON objects, this reads them in one at a time as single dictionaries, eg {_id : ID1, ...} instead of what I'm looking for, which is an error of dictionaries representing multiple documents in my collections, eg [{_id : ID1, ...}, {_id : ID2, ...}, ..., {_id: ID1000, ...}] .

I wouldn't expect this to matter, but I'm on python 3.5 instead of 2.7.

As this example references a secure, remote data source this isn't a reproducible example. Apologies for that. If you have a suggestion for how the question can be improved please let me know.

  • Python version is irrelevant here, nothing to do with your output.
  • Batch_size defines only how many documents mongoDB returns in a single trip to DB (under some limitations: see here here )
  • collection.find always returns an iterator/cursor or None. Batching does its job transparently) (the later if no documents are found)
  • To examine returned documents you have to iterate through the cursor ie

    For document in cursor: print (document)

    or if you want a list of the documents: list(cursor)

    • Remember to do a cursor.rewind() if you need to revisit the documents

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM