简体   繁体   中英

How do you performantly bulk delete documents from a large collection in cloud firestore?

We are using a cloud function to remove all data older than 6 months from our firestore. Unfortunately this ends up reaching a timeout. We have created code based on: https://firebase.google.com/docs/firestore/manage-data/delete-data

We are retrieving the collection, which we need to loop over, using listDocuments() . We can't use get() in our case, as it will not return all the documents. We have documents that have been created without explicitely creating the path towards it.

This was our first hurdle actually as the cloud function reached the timeout on that function. Updating our cloud function to the latest version (code changes) [ https://github.com/googleapis/nodejs-firestore/issues/825] & increasing the timeout to 300 seconds managed to resolve the problem.

We are however now reaching timeouts on the deletion actions. We have noticed that deletions are really slow on large collections, for instance trying to delete 10 documents on a collection of 2000 documents is slower than deleting 200 documents from a collection of 210 documents. Each of these deletions can take from a few milliseconds (for small collections) to almost 3 seconds (for large collections). Because batch actions are limited to a max. of 500 [ https://firebase.google.com/docs/firestore/manage-data/transactions] , we end up getting multiple 3 second deletions eventually reaching the timeout.

Steps we have taken to solve the problem:

  1. Look over firebase documentation again & check stackoverflow for possible solutions, although there doesn't seem to be a post having a solution to this
  2. Contact firebase, but they weren't any help as they mentioned that it was out of their support scope and that we needed to check stackoverflow.
  3. Updated the cloud function to use the latest versions (& code changes), which fixed the timeout on listDocuments()
  4. Increased the timeout to the 540 seconds, which is the maximum we could set it on [ https://cloud.google.com/functions/docs/concepts/exec#timeout]

I would suggest reading over the Firestore best practices documentation . In particular, pay attention to the part that mentions "hotspotting ":

Avoid high read or write rates to lexicographically close documents, or your application will experience contention errors. This issue is known as hotspotting, and your application can experience hotspotting if it does any of the following:

  • Creates new documents at a very high rate and allocates its own monotonically increasing IDs.

  • Cloud Firestore allocates document IDs using a scatter algorithm. You should not encounter hotspotting on writes if you create new documents using automatic document IDs.

  • Creates new documents at a high rate in a collection with few documents.

  • Creates new documents with a monotonically increasing field, like a timestamp, at a very high rate.

  • Deletes documents in a collection at a high rate.

  • Writes to the database at a very high rate without gradually increasing traffic.

You might be able to improve performance if you randomize the document deletes so that they aren't "sequential" from the point of view of Firestore's internal sharding. If you can effectively parallelize the deletes across more shards, you could see a performance boost.

Firestore is not meant for this kind of processes. I recommend taking a look to Bigtable . However, if you need to use Firestore, I propose the following scheme.

  1. Create a Compute Engine Instance for this task.
  2. Include your bulk deleting code into the instance.
  3. Create a startup script that executes this task and then shutdowns.
  4. Use Scheduling to turn up the instance once a month, or you periodic.

This will ensure that the timeout error will not appear, and you will only be billed for the time the instance is on.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM