简体   繁体   English

如何 batch.commit() 用于 firestore 海量文档?

[英]How to batch.commit() for firestore massive documents?

I have approximately 31,000 documents to batch.commit() .我有大约 31,000 个文档要batch.commit()

I'm using Blaze plan.我正在使用 Blaze 计划。

A batch can carry a limit of 500 documents.一个批次最多可以携带 500 个文档。 So, I split the batches with 490 documents.因此,我将 490 个文档分成批次。 I have 65 batches.我有 65 批。

Here is my firebase function code:这是我的 firebase function 代码:

'use strict';

const express = require('express');
const cors = require('cors');
const axios = require('axios');

// Firebase init
const functions = require('firebase-functions');
const admin = require('firebase-admin');
admin.initializeApp();
const firestore = admin.firestore();


const echo = express().use(cors()).post("/", (request, response) => {
    axios.get('https://example.com/api').then(result => {

        const data = result.data.items;
        let batchArray = [];
        let batchIndex = 0;
        let operationCounter = 0;

        //initiate batch;
        batchArray.push(firestore.batch());

        data.forEach(item => {

            const collectionRef = firestore.collection('items').doc();

            const row = {
                itemName: item.name,
                // ... and so on...
            };


            batchArray[batchIndex].set(collectionRef, row);
            operationCounter++;

            if (operationCounter === 490) {
                batchArray.push(firestore.batch());
                functions.logger.info(`Batch index added.`, batchIndex);
                batchIndex++;
                operationCounter = 0;
            }

        });

        /*  
        This code wrote only 140 documents.
        Throws Error: 4 DEADLINE_EXCEEDED: Deadline exceeded

        
        batchArray.forEach(batch => {
            batch.commit()
                .then(result=> functions.logger.info("batch.commit() succeeded:", result) )
                .catch(error=>functions.logger.info("batch.commit() failed:", error));
        })

        */

        /* 
        This code wrote only 630 documents 
        Throws Error: 4 DEADLINE_EXCEEDED: Deadline exceeded

        Promise.all([
            batchArray.forEach(batch => {
                setTimeout(
                    ()=>batch.commit().then(result=> functions.logger.info("batch.commit() succeeded:", result) ).catch(error=>functions.logger.info("batch.commit() failed:", error)),
                    1000);
            })
        ]).catch(error => functions.logger.error("batch.commit() error:", error));

        */
        // This code wrote 2100 documents.
        return Promise.all([
            batchArray.forEach(batch => {
                batch.commit()
                    .then(result => functions.logger.info("batch.commit() succeeded:", result))
                    .catch(error => functions.logger.warn("batch.commit() failed:", error))
            })
        ]).then(result => {
            functions.logger.info("all batches succeeded:", result);
            return response.status(200).json({ "status": "success", "data": `success` });
        })
        .catch(error => {
            functions.logger.warn("all batches failed:", error);
            return response.status(200).json({ "status": "error", "data": `${error}` });
        });


    }).catch(error => {
        functions.logger.error("HTTPS Response Error", error);
        return response.status(204).json({ "status": "error", "data": `${error}` });

    });
});


exports.echo = functions.runWith({
    timeoutSeconds: 60 * 9,
}).https.onRequest(echo);

I got a response with "success" after a few seconds.几秒钟后,我得到了“成功”的回应。 But the inserted firestore data appeared only after 7 minutes and in cloud functions log, it shows the logs of errors with 5 out of 65 batches successful.但是插入的 firestore 数据仅在 7 分钟后出现,并且在云功能日志中,它显示错误日志,65 批中有 5 批成功。

The thrown error is:抛出的错误是:

batch.commit() failed: { Error: 4 DEADLINE_EXCEEDED: Deadline exceeded 
at Object.callErrorFromStatus (/workspace/node_modules/@grpc/grpc-js/build/src/call.js:31:26) 
at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client.js:176:52) 
at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:342:141) 
at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:305:181) 
at process.nextTick (/workspace/node_modules/@grpc/grpc-js/build/src/call-stream.js:124:78) 
at process._tickCallback (internal/process/next_tick.js:61:11) 
Caused by: Error at WriteBatch.commit (/workspace/node_modules/@google-cloud/firestore/build/src/write-batch.js:419:23) 
at Promise.all.batchArray.forEach.batch (/workspace/index.js:100:23) 
at Array.forEach (<anonymous>) at axios.get.then.result (/workspace/index.js:99:24) 
at process._tickCallback (internal/process/next_tick.js:68:7) code: 4, details: 'Deadline exceeded', metadata: Metadata { internalRepr: Map {}, options: {} }, note: 'Exception occurred in retry method that was not classified as transient' }

The error Error: 4 DEADLINE_EXCEEDED may be related to firestore quotas.错误Error: 4 DEADLINE_EXCEEDED可能与 firestore 配额有关。 But I don't know which limitation is related to this issue.但我不知道哪个限制与此问题有关。

Its most likely because your forEach where you are doing the commit is not working as you expect.这很可能是因为您进行commitforEach没有按预期工作。 Time and time again, await s with the forEach function causes problems like this.一次又一次,带有forEach function 的await会导致这样的问题。 Long ago, I thought that since the await is in the forEach, it will wait until it finishes, then go to the next item in the array, but that isn't true.很久以前,我认为由于await在 forEach 中,它会等到它完成,然后 go 到数组中的下一个项目,但事实并非如此。 It will run them all at once.它将同时运行它们。 I would suggest going with a traditional for loop.我建议使用传统的 for 循环。

Also, I would suggest not using the.then syntax.另外,我建议不要使用 then 语法。 In this cause, it would still run them all at once.在这个原因下,它仍然会同时运行它们。 Try using the await with a tranditional for loop.尝试将 await 与传统的 for 循环结合使用。 This will solve your issues.这将解决您的问题。

Another thing, your Promise.all is not helping here.另一件事,你的 Promise.all 在这里没有帮助。 Promise.all is for running multiple commands at the same time, but because of the exceeded error, you need to run them one at a time (I know, it sucks since you have so many). Promise.all 用于同时运行多个命令,但由于超出错误,您需要一次运行它们(我知道,因为有这么多命令,所以很糟糕)。

for (const batch of batchArray) {
  await batch.commit()
}

I'm not sure how many commits it would take before you get the exceeded amount (with the above approach), but I'm curious if you do 2-3 commits at a time or something.我不确定在获得超出的数量之前需要多少次提交(使用上述方法),但我很好奇您是否一次进行 2-3 次提交或其他什么。 However, its generally best to do one at a time.但是,通常最好一次做一个。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM