[英]Aggregate Many Firebase Datasets in BigQuery into Single Dataset
For starters, I am relatively new to Firebase and BigQuery...so a bit naive... 首先,我对Firebase和BigQuery还是比较陌生的。。。
Background : 背景 :
I am working on a project where we have many distinct mobile apps - one for each country, for each OS (iOS, Android) - with all of the raw event data getting pushed into BigQuery datasets that live in the same project. 我正在一个项目中,我们有许多不同的移动应用程序-每个国家/地区,每个操作系统(iOS,Android)一个,所有原始事件数据都被推送到同一项目中的BigQuery数据集中。 The rationale behind having distinct apps by country relates to regulatory requirements. 根据国家/地区拥有不同应用的背后原因与监管要求有关。
For dashboarding purposes, I had hoped to be able to combine all datasets into a single aggregate dataset (with tables by year) that is partitioned by date via a "partition_date" column that I added. 出于仪表板的目的,我希望能够将所有数据集合并为一个聚合数据集(按年份列出表格),该数据集通过我添加的“ partition_date”列按日期进行分区。 My goal has been to take advantage of the ability to use BigQuery nested structures, and to maintain the original nested fields from the raw Firebase tables. 我的目标一直是利用使用BigQuery嵌套结构的功能,并维护原始Firebase表中的原始嵌套字段。
I've been searching around over the past month or so, but I haven't come across any other use cases where people are dealing with many datasets. 在过去的一个月左右的时间里,我一直在搜索,但是我没有遇到人们正在处理许多数据集的任何其他用例。
Questions : 问题 :
COUNT_DISTINCT(user_dim.app_info.app_instance_id)
in Data Studio) 我避免使用这种方法,因为它限制了我使用日期范围过滤器的能力,该日期范围过滤器可以动态地动态计算唯一用户(使用Data Studio中的COUNT_DISTINCT(user_dim.app_info.app_instance_id)
) The end goal is to empower any user (ex. someone who doesn't know SQL) to be able to answer simple questions across all apps (ex. how many users opened the app yesterday), and to let end users make use of date range filters in the dashboard interface. 最终目标是使任何用户(例如,不了解SQL的人)能够在所有应用程序中回答简单的问题(例如,昨天有多少用户打开了该应用程序),并允许最终用户利用日期仪表板界面中的范围过滤器。
I've been able to write ad hoc queries to get at the answers by querying across all datasets, but I have not found a good solution that will make it easier for non-technical users within the dashboard. 我已经能够编写临时查询以通过查询所有数据集来获得答案,但是我还没有找到一个好的解决方案,该解决方案可以使仪表板内的非技术用户更轻松地进行操作。
Also, this is my first Stack Overflow question...please let me know if I am being too vague, including too many questions, or otherwise abusing the platform. 另外,这是我的第一个堆栈溢出问题...请让我知道我是否太含糊(包括太多问题)或滥用平台。
Thanks in advance for any thoughts. 预先感谢您的任何想法。
If your end goal is to empower users to answer simple questions, aggregating KPI data makes sense to me. 如果您的最终目标是使用户能够回答简单的问题,那么汇总KPI数据对我来说很有意义。 I would probably approach this by setting up a scheduled daily job that selects the relevant data from all the datasets and loads it into a new dataset, which can then be used in Data Studio. 我可能会通过设置一个排定的日常工作来解决此问题,该工作将从所有数据集中选择相关数据并将其加载到新数据集中,然后可以在Data Studio中使用它。 The new datasets could have the default Firebase date table suffix to support date range filters. 新的数据集可以具有默认的Firebase日期表后缀以支持日期范围过滤器。
I am also relatively new to BigQuery and Firebase though, so maybe there is a better way. 不过,我对BigQuery和Firebase还是比较陌生,所以也许有更好的方法。
You can find more information about scheduling in BigQuery here: Schedule query in BigQuery 您可以在此处找到有关BigQuery中的计划的更多信息: BigQuery中的计划查询
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.