简体   繁体   English

如何使Firebase数据库与BigQuery保持同步?

[英]How to keep a Firebase database sync with BigQuery?

We are working on a project where a lot of data is involved. 我们正在从事一个涉及大量数据的项目。 Now we recently read about Google BigQuery. 现在,我们最近阅读了有关Google BigQuery的信息。 But how can we export the data to this platform? 但是如何将数据导出到该平台? We have seen the sample of importing logs into Google BigQuery. 我们已经看到了将日志导入Google BigQuery的示例。 But this does not contain information about updating and deleting data (only inserting). 但这不包含有关更新和删除数据(仅插入)的信息。

So our objects are able to update their data. 因此,我们的对象能够更新其数据。 And we have a limited amount of queries on the BigQuery tables. 而且我们在BigQuery表上的查询数量有限。 How can we synchronize our data without exceeding the BigQuery quota limits. 如何在不超过BigQuery配额限制的情况下同步数据。

Our current function code: 我们当前的功能代码:

'use strict';

// Default imports.

const functions = require('firebase-functions');
const bigQuery = require('@google-cloud/bigquery')();

// If you want to change the nodes to listen to REMEMBER TO change the constants below.
// The 'id' field is AUTOMATICALLY added to the values, so you CANNOT add it.

const ROOT_NODE = 'categories';
const VALUES = [
    'name'
];

// This function listens to the supplied root node.
// When the root node is completed empty all of the Google BigQuery rows will be removed.
// This function should only activate when the root node is deleted.

exports.root = functions.database.ref(ROOT_NODE).onWrite(event => {
    if (event.data.exists()) {
        return;
    }

    return bigQuery.query({
        query: [
            'DELETE FROM `stampwallet.' + ROOT_NODE + '`',
            'WHERE true'
        ].join(' '),
        params: []
    });
});

// This function listens to the supplied root node, but on child added/removed/changed.
// When an object is inserted/deleted/updated the appropriate action will be taken.

exports.children = functions.database.ref(ROOT_NODE + '/{id}').onWrite(event => {
    const id = event.params.id;

    if (!event.data.exists()) {
        return bigQuery.query({
            query: [
                'DELETE FROM `stampwallet.' + ROOT_NODE + '`',
                'WHERE id = ?'
            ].join(' '),
            params: [
                id
            ]
        });
    }

    const item = event.data.val();

    if (event.data.previous.exists()) {
        let update = [];
        for (let index = 0; index < VALUES.length; index++) {
            const value = VALUES[index];

            update.push(item[value]);
        }
        update.push(id);

        return bigQuery.query({
            query: [
                'UPDATE `stampwallet.' + ROOT_NODE + '`',
                'SET ' + VALUES.join(' = ?, ') + ' = ?',
                'WHERE id = ?'
            ].join(' '),
            params: update
        });
    }

    let template = [];
    for (let index = 0; index < VALUES.length; index++) {
        template.push('?');
    }

    let create = [];
    create.push(id);
    for (let index = 0; index < VALUES.length; index++) {
        const value = VALUES[index];

        create.push(item[value]);
    }

    return bigQuery.query({
        query: [
            'INSERT INTO `stampwallet.' + ROOT_NODE + '` (id, ' + VALUES.join(', ') + ')',
            'VALUES (?, ' + template.join(', ') + ')'
        ].join(' '),
        params: create
    });
});

What would be the best way to sync firebase to bigquery? 将Firebase同步到bigquery的最佳方法是什么?

BigQuery supports UPDATE and DELETE, but not frequent ones - BigQuery is an analytical database, not a transactional one. BigQuery支持UPDATE和DELETE,但不支持频繁使用-BigQuery是一个分析数据库,而不是事务性数据库。

To synchronize a transactional database with BigQuery you can use approaches like: 要将事务数据库与BigQuery同步,可以使用以下方法:

With Firebase you could schedule a daily load to BigQuery from their daily backups: 使用Firebase,您可以安排BigQuery每日备份中的每日负载:

... way to sync firebase to bigquery? ...将Firebase同步到bigquery的方法?

I recommend considering streaming all you data into BigQuery as a historical data. 我建议考虑将所有数据作为历史数据streaming到BigQuery中。 You can mark entries as new(insert), update or delete. 您可以将条目标记为新(插入),更新或删除。 Then, on BigQuery side, you can write query that will resolve most recent values for specific record based on whatever logic you have. 然后,在BigQuery端,您可以编写查询,该查询将根据您拥有的任何逻辑来解析特定记录的最新值。
So your code can be reused almost 100% - just fix logic of UPDATE / DELETE to have it as INSERT 因此您的代码几乎可以被100%重用-只需修复UPDATE / DELETE逻辑即可将其作为INSERT

// When an object is inserted/deleted/updated the appropriate action will be taken . //当插入/删除/更新对象时,将采取适当的措施

So our objects are able to update their data. 因此,我们的对象能够更新其数据。 And we have a limited amount of queries on the BigQuery tables. 而且我们在BigQuery表上的查询数量有限。 How can we synchronize our data without exceeding the BigQuery quota limits? 如何在不超过BigQuery配额限制的情况下同步数据?

Yes, BigQuery supports UPDATE , DELETE , INSERT as a part of Data Manipulation Language . 是的,BigQuery支持UPDATEDELETEINSERT作为Data Manipulation Language的一部分。
General availability was announced in BigQuery Standard SQL at March 8, 2017 BigQuery Standard SQL已于2017年3月8日announced全面可用性

Before considering using this feature for syncing BigQuery with transactional data – please take a look at Quotas , Pricing and Known Issues . 在考虑使用此功能将BigQuery与事务数据同步之前,请先看一下QuotasPricingKnown Issues

Below are some excerpts! 以下是一些摘录!

Quotas (excerpts) Quotas (节选)
DML statements are significantly more expensive to process than SELECT statements. SELECT语句相比,DML语句的处理成本明显更高。
• Maximum UPDATE/DELETE statements per day per table: 96 •每张表每天最多可有UPDATE / DELETE条语句:96
• Maximum UPDATE/DELETE statements per day per project: 1,000 •每个项目每天最大的UPDATE / DELETE语句:1,000

Pricing (excerpts, extra highlighting + comment added) Pricing (节选,额外突出显示+添加评论)
BigQuery charges for DML queries based on the number of bytes processed by the query. BigQuery会根据查询处理的字节数为DML查询收费。
The number of bytes processed is calculated as follows: 处理的字节数计算如下:

UPDATE Bytes processed = sum of bytes in referenced fields in the scanned tables + the sum of bytes for all fields in the updated table at the time the UPDATE starts. UPDATE Bytes processed的字节=扫描表中引用字段中的字节总和 +更新开始时更新表所有字段的字节总和
DELETE Bytes processed = sum of bytes of referenced fields in the scanned tables + sum of bytes for all fields in the modified table at the time the DELETE starts. DELETE Bytes processed =扫描表中引用字段的字节总和 +删除开始时修改表所有字段的字节总和

Comment by post author: As you can see you will be charged for whole table scan even though you update just one row! 帖子作者的评论: 如您所见,即使您只更新一行,您也将需要为整个表扫描付费! This is a key here for decision making, I think! 我认为这是决策的关键!

Known Issues (excerpts) Known Issues (节选)
• DML statements cannot be used to modify tables with REQUIRED fields in their schema. •DML语句不能用于修改其模式中带有REQUIRED字段的表。
• Each DML statement initiates an implicit transaction, which means that changes made by the statement are automatically committed at the end of each successful DML statement. •每个DML语句都会启动一个隐式事务,这意味着该语句所做的更改将在每个成功的DML语句结束时自动提交。 There is no support for multi-statement transactions. 不支持多语句交易。
• The following combinations of DML statements are allowed to run concurrently on a table: •允许以下DML语句组合在表上同时运行:

  • UPDATE and INSERT 更新和插入
  • DELETE and INSERT 删除并插入
  • INSERT and INSERT 插入和插入

    Otherwise one of the DML statements will be aborted. 否则,其中一个DML语句将被中止。
    For example, if two UPDATE statements execute simultaneously against the table then only one of them will succeed. 例如,如果针对该表同时执行两个UPDATE语句,则其中只有一个将成功。

• Tables that have been written to recently via BigQuery Streaming (tabledata.insertall) cannot be modified using UPDATE or DELETE statements. •最近通过BigQuery Streaming写入的表(tabledata.insertall)不能使用UPDATE或DELETE语句进行修改。 To check if the table has a streaming buffer, check the tables.get response for a section named streamingBuffer. 要检查表是否具有流缓冲区,请检查tables.get响应以获取名为streamingBuffer的部分。 If it is absent, the table can be modified using UPDATE or DELETE statements. 如果不存在,则可以使用UPDATE或DELETE语句修改表。

The reason why you didn't find update and delete functions in BigQuery is they are not supported by BigQuery. 在BigQuery中找不到更新和删除功能的原因是BigQuery不支持它们。 BigQuery has only append and truncate operations. BigQuery仅具有追加和截断操作。 If you want to update or delete row in your BigQuery you'll need to delete the whole database and write it again with modified row or without it. 如果您要更新或删除BigQuery中的行,则需要删除整个数据库,然后再用修改后的行或不使用修改后的行再次写入。 It is not a good idea. 这不是一个好主意。

BigQuery is used to store big amounts of data and have a quick access to it, for example it is good for collecting data from different sensors. BigQuery用于存储大量数据并可以快速访问它,例如,它可用于从不同的传感器收集数据。 But for your customer database you need to use MySQL or NoSQL database. 但是对于您的客户数据库,您需要使用MySQL或NoSQL数据库。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Mongoose - 保持本地数据库与远程数据库同步 - Mongoose - Keep local database in sync with remote database 如何使用superagent软件包保持与firebase数据库的持久连接? - how to keep persistent connections to firebase database using superagent package? 在AngularJS和数据库之间保持唯一标识符同步的最佳方法是什么? - What is the best method to keep a unique identifier in sync between AngularJS and database? 如何将应用程序本地存储同步到数据库? - how to sync app local storage to database? "如何在 firebase\/auth js 中保持会话启用?" - how to keep session enabled in firebase/auth js? 如何使用 nodejs 转储 postgres 数据库并与其他 postgres 数据库同步 - How to dump postgres database and sync with other postgres database using nodejs 带有Elasticsearch的Postgres(保持同步)-nodeJS - Postgres with elasticsearch (keep in sync) - nodeJS 如何将SQL数据库与ignite集群连接以同步数据? - How to connect sql database with ignite cluster to sync data? 如何使用Mysql和node.js进行同步数据库查询? - How to make a sync database query with Mysql and node.js? 如何将文件存储在磁盘中并在Postgres数据库中引用并进行同步? - How to store file in a disk and reference it in Postgres database and sync up?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM