简体   繁体   English

将BigQuery中的许多Firebase数据集聚合到单个数据集中

[英]Aggregate Many Firebase Datasets in BigQuery into Single Dataset

For starters, I am relatively new to Firebase and BigQuery...so a bit naive... 首先,我对Firebase和BigQuery还是比较陌生的。。。


Background : 背景

I am working on a project where we have many distinct mobile apps - one for each country, for each OS (iOS, Android) - with all of the raw event data getting pushed into BigQuery datasets that live in the same project. 我正在一个项目中,我们有许多不同的移动应用程序-每个国家/地区,每个操作系统(iOS,Android)一个,所有原始事件数据都被推送到同一项目中的BigQuery数据集中。 The rationale behind having distinct apps by country relates to regulatory requirements. 根据国家/地区拥有不同应用的背后原因与监管要求有关。

For dashboarding purposes, I had hoped to be able to combine all datasets into a single aggregate dataset (with tables by year) that is partitioned by date via a "partition_date" column that I added. 出于仪表板的目的,我希望能够将所有数据集合并为一个聚合数据集(按年份列出表格),该数据集通过我添加的“ partition_date”列按日期进行分区。 My goal has been to take advantage of the ability to use BigQuery nested structures, and to maintain the original nested fields from the raw Firebase tables. 我的目标一直是利用使用BigQuery嵌套结构的功能,并维护原始Firebase表中的原始嵌套字段。

I've been searching around over the past month or so, but I haven't come across any other use cases where people are dealing with many datasets. 在过去的一个月左右的时间里,我一直在搜索,但是我没有遇到人们正在处理许多数据集的任何其他用例。


Questions : 问题

  • Does it make sense to take this approach (aggregate all datasets into a table partitioned by date, and a nested field based on app name)? 采用这种方法是否有意义(将所有数据集聚合到按日期划分的表中,以及基于应用程序名称的嵌套字段)?
  • As an alternative - I tried denormalizing the tables, with tables sharded by date...though the resulting tables (not surprisingly) are much larger due to flattening the nested structures, which makes me think this approach is less than ideal. 作为替代方案-我尝试对表进行非规范化处理,并按日期对表进行分片……尽管由于嵌套结构的扁平化,结果表(毫不奇怪)要大得多,这使我认为这种方法不太理想。 Should I actually consider this approach? 我是否应该真正考虑这种方法?
  • Should I consider just making a relatively simple aggregate table of KPIs? 我应该考虑只制作一个相对简单的KPI汇总表吗? I've avoided this approach because it limits my ability to use date range filters that dynamically calculate unique users on the fly (using COUNT_DISTINCT(user_dim.app_info.app_instance_id) in Data Studio) 我避免使用这种方法,因为它限制了我使用日期范围过滤器的能力,该日期范围过滤器可以动态地动态计算唯一用户(使用Data Studio中的COUNT_DISTINCT(user_dim.app_info.app_instance_id)

The end goal is to empower any user (ex. someone who doesn't know SQL) to be able to answer simple questions across all apps (ex. how many users opened the app yesterday), and to let end users make use of date range filters in the dashboard interface. 最终目标是使任何用户(例如,不了解SQL的人)能够在所有应用程序中回答简单的问题(例如,昨天有多少用户打开了该应用程序),并允许最终用户利用日期仪表板界面中的范围过滤器。

I've been able to write ad hoc queries to get at the answers by querying across all datasets, but I have not found a good solution that will make it easier for non-technical users within the dashboard. 我已经能够编写临时查询以通过查询所有数据集来获得答案,但是我还没有找到一个好的解决方案,该解决方案可以使仪表板内的非技术用户更轻松地进行操作。

Also, this is my first Stack Overflow question...please let me know if I am being too vague, including too many questions, or otherwise abusing the platform. 另外,这是我的第一个堆栈溢出问题...请让我知道我是否太含糊(包括太多问题)或滥用平台。

Thanks in advance for any thoughts. 预先感谢您的任何想法。

If your end goal is to empower users to answer simple questions, aggregating KPI data makes sense to me. 如果您的最终目标是使用户能够回答简单的问题,那么汇总KPI数据对我来说很有意义。 I would probably approach this by setting up a scheduled daily job that selects the relevant data from all the datasets and loads it into a new dataset, which can then be used in Data Studio. 我可能会通过设置一个排定的日常工作来解决此问题,该工作将从所有数据集中选择相关数据并将其加载到新数据集中,然后可以在Data Studio中使用它。 The new datasets could have the default Firebase date table suffix to support date range filters. 新的数据集可以具有默认的Firebase日期表后缀以支持日期范围过滤器。

I am also relatively new to BigQuery and Firebase though, so maybe there is a better way. 不过,我对BigQuery和Firebase还是比较陌生,所以也许有更好的方法。

You can find more information about scheduling in BigQuery here: Schedule query in BigQuery 您可以在此处找到有关BigQuery中的计划的更多信息: BigQuery中的计划查询

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM