I have a large collection of documents which I'm trying to update using the pymongo.update function. I am finding all documents that fall within a certain polygon and updating all the points found with "update_value".
for element in geomShapeCollection:
db.collectionName.update({"coordinates":{"$geoWithin":{"$geometry":element["geometry_part"]}}}, {"$set":{"Update_key": update_value}}, multi = True, timeout=False)
For smaller collections this command works as expected. In the largest dataset the command works for 70-80% of the data and then throws the error:
pymongo.errors.OperationFailure: cursor id '428737620678732339' not valid at server
The pymongo documentation tells me that this is possibly due to a timeout issue.
Cursors in MongoDB can timeout on the server if they've been open for a long time without any operations being performed on them.
Reading through the pymongo documentation, the find() function has a boolean flag for timeout.
find(spec=None, fields=None, skip=0, limit=0, timeout=True, snapshot=False, tailable=False, _sock=None, _must_use_master=False,_is_command=False)
However the update function appears not to have this:
update(spec, document, upsert=False, manipulate=False, safe=False, multi=False)
Is there any way to set this timeout flag for the update function? Is there any way I can change this so that I do not get this OperationFailure error? Am I correct in assuming this is an timeout error as pymongo states that it throws this error when
Raised when a database operation fails.
After some research and lots of experimentation I found that it was the outer loop cursor that was causing the error.
for element in geomShapeCollection:
geomShapeCollection is a cursor to a mongodb collection. There are several elements in geoShapeCollection where large amounts of elements fall, because these updates take such a considerable amount of time the geomShapeCollection cursor closes.
The problem was not with the update function at all. Adding a (timeout=False) to the outer cursor solves this problem.
for element in db.geomShapeCollectionName.find(timeout=False):
db.collectionName.update({"coordinates":{"$geoWithin":{"$geometry":element["geometry_part"]}}}, {"$set":{"Update_key": update_value}}, multi = True, timeout=False)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.