简体   繁体   English

Firebase数据到Google BigQuery

[英]Firebase data to Google BigQuery

Firebase offers private backups on Google Cloud Storage . Firebase 在Google云端存储上提供私人备份 One of the featured use case is "Ingestion into Analytics Products": 其中一个特色用例是“摄入分析产品”:

Private Backups provides a perfect pipeline into cloud analytics products such as Google’s BigQuery. Cloud Analytics products often prefer to ingest data through Cloud Storage buckets rather than directly from the application.

I have a lot of data in Firebase (more than 1GB when exported to a Cloud Storage bucket) and, as described in Firebase offering, I wanted to put those data in Big Query. 我在Firebase中有大量数据(导出到云存储桶时超过1GB),并且如Firebase产品中所述,我想将这些数据放入Big Query中。

But is it really possible to write a table schema that fits Firebase raw data? 但是,是否真的可以编写适合Firebase原始数据的表模式? Let's take as an example the dinosaur-facts database from Firebase documentation. 我们以Firebase文档中的恐龙事实数据库为例。 The JSON looks like this: JSON看起来像这样:

{
  "dinosaurs" : {
    "bruhathkayosaurus" : {
      "appeared" : -70000000,
      "height" : 25
    },
    "lambeosaurus" : {
      "appeared" : -76000000,
      "height" : 2.1
    }
  },
  "scores" : {
    "bruhathkayosaurus" : 55,
    "lambeosaurus" : 21
  }
}

To list all dinosaurs, I suppose the only way would be to use a RECORD field in bigQuery schema. 要列出所有恐龙,我想唯一的方法是在bigQuery模式中使用RECORD字段。 But usually RECORDS in BigQuery correspond to an array in the imported JSON. 但是,BigQuery中的RECORDS通常对应于导入的JSON中的数组。 And there's no array here in Firebase, just an object with dinosaur names as the key names. 而Firebase中没有数组,只是一个以恐龙名称作为关键名称的对象。

So a BigQuery table schema like this doesn't work: 所以像这样的BigQuery表模式不起作用:

[
    {
        "name": "dinosaurs",
        "type": "RECORD",
        "mode": "REQUIRED",
        "fields": [
            {
                "name": "dinosaur",
                "type": "RECORD",
                "mode": "REPEATED",
                "fields": [
                    {
                        "name": "appeared",
                        "type": "INTEGER"
                    },
                    {
                        "name": "height",
                        "type": "INTEGER"
                    },
                    {
                        "name": "length",
                        "type": "INTEGER"
                    },
                    {
                        "name": "order",
                        "type": "STRING"
                    },
                    {
                        "name": "vanished",
                        "type": "INTEGER"
                    },
                    {
                        "name": "weight",
                        "type": "INTEGER"
                    }
                ]
            },
            {
                "name": "scores",
                "type": "RECORD",
                "mode": "REPEATED",
                "fields": [
                    {
                        "name": "dinosaur",
                        "type": "INTEGER"
                    }
                ]
            }
        ]
    }
]

Is it possible to write a table schema that fits Firebase raw data? 是否可以编写适合Firebase原始数据的表模式? Or should we first prepare the data to make it compatible with BigQuery? 或者我们应该首先准备数据以使其与BigQuery兼容?

Since the data above is just JSON, you should be able to get it to work with Firebase. 由于上面的数据只是JSON,因此您应该能够使用Firebase。 However, I think that it would be much easier to prepare the data after the backup . 但是,我认为备份后准备数据要容易得多。

You mentioned that there was no arrays in the Firebase data. 您提到Firebase数据中没有数组。 Firebase does support arrays, but they have to meet a certain criteria. Firebase确实支持数组,但它们必须符合某个标准。

// we send this
['a', 'b', 'c', 'd', 'e']
// Firebase stores this
{0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'}
// since the keys are numeric and sequential,
// if we query the data, we get this
['a', 'b', 'c', 'd', 'e']

Even though it may look like an object in the Firebase database, it will come back as an array when queried. 即使它看起来像Firebase数据库中的对象,它也会在查询时作为数组返回。

So it is feasible to create your schema in your Firebase database, but it would likely create a lot of overhead for your application. 因此,在Firebase数据库中创建架构是可行的,但它可能会为您的应用程序带来很多开销。

When writing this 03/2017, I can confirm that there's no real integration between Firebase Realtime database and BigQuery. 在写这篇03/2017时,我可以确认Firebase实时数据库和BigQuery之间没有真正的集成。 Only Firebase Analytics can be imported easily into BigQuery. 只有Firebase Analytics可以轻松导入BigQuery。 All this is not clearly explained on Firebase either... 所有这些都没有在Firebase上明确解释......

We ended up writing our own solution, but you can check out this Github repo that has some 400+ stars, so I am assuming a few people found it useful... 我们最终编写了自己的解决方案,但是你可以查看这个拥有400多颗星的Github回购 ,所以我假设有一些人发现它很有用......

In fact Big Query only support newline-delimited JSON or JSONL: https://cloud.google.com/bigquery/preparing-data-for-bigquery 事实上,Big Query仅支持换行符分隔的JSON或JSONL: https ://cloud.google.com/bigquery/preparing-data-for-bigquery

http://jsonlines.org/ http://jsonlines.org/

JSON Lines is a convenient format for storing structured data that may be processed one record at a time. JSON Lines是一种方便的格式,用于存储可以一次处理一条记录的结构化数据。

To prepare Firebase data for import in Big Query, we just need to: 要准备Firebase数据以便在Big Query中导入,我们只需要:

  1. Get the JSON from Firebase (or from a Cloud Storage bucket in case of private backup) 从Firebase获取JSON(或在私有备份的情况下从云存储桶获取)
  2. Parse it to get a JS object 解析它以获取JS对象
  3. Loop through each record, stringify data and add a line separator 循环遍历每条记录,对数据进行字符串化并添加行分隔符
    \n    var dataForBigQuery = ''; var dataForBigQuery ='';\n    for (var i in dinosaurs) { for(var i in dinosaurs){\n      dataForBigQuery+= JSON.stringify(dinosaurs[i]) + '\\n'; dataForBigQuery + = JSON.stringify(dinosaurs [i])+'\\ n';\n    } }\n
  4. Save those data in a new file. 将这些数据保存在新文件中。 It will then be ready for import into BigQuery. 然后它就可以导入BigQuery了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM