在Firebase中查找集合长度

Question

I have over 20k objects in my Firebase Realtime Database. 我的Firebase实时数据库中有超过2万个对象。 I now need to take out all these objects and do stuff to them. 现在，我需要取出所有这些对象并对它们进行处理。 The problem is the server runs out of memory every time I do it. 问题是服务器每次执行时都会用尽内存。 This is my current code: 这是我当前的代码：

sendEmail.get('/:types/:message', cors(), async (req, res, next) => {
    console.log(5);
    const types = JSON.parse(req.params.types);
    console.log('types', types);
    let recipients = [];
    let mails = [];
    if (types.includes('students')) {
        console.log(1);
        const tmpUsers = await admin.database().ref('Users').orderByChild('student').equalTo(true).once('value').then(r => r.val()).catch(e => console.log(e));
        recipients = recipients.concat(tmpUsers);
    }
    if (types.includes('solvers')) {
        console.log(2);
        let tmpUsers = await admin.database().ref('Users').orderByChild('userType').equalTo('person').once('value').then(r => r.val()).catch(e => console.log(e));
        tmpUsers = tmpUsers.concat(arrayFromObject(await admin.database().ref('Users').orderByChild('userType').equalTo('company').once('value').then(r => r.val()).catch(e => console.log(e))));
        recipients = recipients.concat(tmpUsers);
    }
});

So I have two options. 所以我有两个选择。 Streaming or limiting the response with startAt and endAt . 使用startAt和endAt流式传输或限制响应。 But to limit the responses I need to know how many objects exactly I have. 但是要限制响应，我需要知道我到底有多少个对象。 And to do this I need to download the whole collection... You see my problem now. 为此，我需要下载整个收藏集...您现在看到了我的问题。 How can I learn how many documents I have, without downloading the whole collection? 如何在不下载整个文档集的情况下了解我有多少个文档？

Answer 1

You could try paginating your query by combining limitToFirst / limitToLast and startAt / endAt . 您可以尝试结合使用limitToFirst / limitToLast和startAt / endAt对查询进行分页。

For example, you could perform the first query with limitToFirst(1000) , then obtain the last key from this returned list and use that with startAt(key) and another limitToFirst(1000) , repeating until you reach the end of the collection. 例如，您可以使用limitToFirst(1000)执行第一个查询，然后从此返回列表中获取最后一个键，并将其与startAt(key)和另一个limitToFirst(1000) ，重复进行直到到达集合的末尾。

In node.js, it might look something like this ( untested code ): 在node.js中，它可能看起来像这样（ 未经测试的代码 ）：

let recipients = [];

let tmpUsers = next();
recipients = filter(recipients, tmpUsers);

// startAt is inclusive, so when this reaches the last result there will only be 1
while (tmpUsers.length>1) {
    let lastKey = tmpUsers.slice(-1).pop().key;
    tmpUsers = next(lastKey);
    if (tmpUsers.length>1) { // Avoid duplicating last result
        recipients = filter(recipients, tmpUsers);
    }
}

async function next(startAt) {
    if (!startAt) {
        return await admin.database().ref('Users')
                .orderByKey()
                .limitToFirst(1000)
                .once('value').then(r => r.val()).catch(e => console.log(e));
    } else {
        return await admin.database().ref('Users')
                .orderByKey()
                .startAt(startAt)
                .limitToFirst(1000)
                .once('value').then(r => r.val()).catch(e => console.log(e));
    }
}

function filter(array1, array2) {
    // TODO: Filter the results here as we can't combine orderByChild/orderByKey
    return array1.concat(array2);
}

The problem with this is that you won't be able to use database-side filtering, so you'd need to filter the results manually, which might make things worse, depending on how many items you need to keep in the recipients variable at a time. 这样做的问题是，您将无法使用数据库端过滤，因此您需要手动过滤结果，这可能会使情况变得更糟，具体取决于您需要在recipients变量中保留多少项一个时间。

Another option would be to process them in batches (of 1000 for example), pop them from the recipients array to free up resources and then move onto the next batch. 另一个选择是分批处理（例如1000个），从recipients数组中弹出它们以释放资源，然后移至下一个批处理。 It does depend entirely on what actions you need to perform on the objects, and you'll need to weigh up whether it's actually necessary to process (and keep in memory) the entire result set in one go. 它确实完全取决于您需要对对象执行哪些操作，并且您需要权衡一次是否真的有必要处理（并保存在内存中）整个结果集。

Answer 2

You don't need to know the size of the collection to process them by batch. 您无需知道集合的大小即可批量处理它们。

You can do it by ordering them by key, limiting to 1000 or so, and then on next batch start the last key of the first batch. 您可以通过按键顺序（不超过1000个）对其进行排序，然后在下一个批次中启动第一个批次的最后一个密钥来完成此操作。

If you still want to know how to get the size of the collection, the only good way is to maintain the size of collection in separate node and keep it updated when the collection is updated. 如果您仍然想知道如何获取集合的大小，唯一的好方法是在单独的节点上维护集合的大小，并在更新集合时保持更新。

在Firebase中查找集合长度

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-11-14 13:54:20

解决方案2
2 2018-11-14 14:00:45

在Firebase中查找集合长度

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-11-14 13:54:20

解决方案2 2 2018-11-14 14:00:45

解决方案1
2 已采纳 2018-11-14 13:54:20

解决方案2
2 2018-11-14 14:00:45