繁体   English   中英

如何在新架构中编写Bigquery,并在Firebase分析中替换旧架构中的event_dim?

[英]how to write Bigquery in new schema with replacing event_dim in old schema from Firebase analytics?

旧的BigQuery Export架构明智的脚本正在运行。它在下面给出。 但是我希望复制此代码并根据新的导出模式编写它,因为我们更改了Bigquery模式。 请帮助因为新的BigQuery Export模式我没有找到任何针对event_dim的其他相应记录(event_dim是根据旧的BigQuery Export模式)。

以下是BigQuery Export架构的链接: 点击此处

 SELECT user_dim.app_info.app_instance_id
          , (SELECT MIN(timestamp_micros) FROM UNNEST(event_dim)) min_time
          , (SELECT MAX(timestamp_micros) FROM UNNEST(event_dim)) max_time,
                event.name,
                params.value.int_value engagement_time
        FROM `xxx.app_events_*`,
        UNNEST(event_dim) as event,
        UNNEST(event.params) as params,
        UNNEST(user_dim.user_properties) as user_params
        where (event.name = "user_engagement" and params.key = "engagement_time_msec")
        and
                (user_params.key = "access" and user_params.value.value.string_value = "true") and
                PARSE_DATE('%Y%m%d', event.date) >= date_sub("{{upto_date (yyyy-mm-dd)}}", interval {{last n days}} day) and
                PARSE_DATE('%Y%m%d', event.date) <= "{{upto_date (yyyy-mm-dd)}}"

尝试下面的查询,但我想要一个SELECT语句的app_instance,min_time,max_time,event_name,engagement_time。 当我使用'group by'时,我无法一次获得所有这些(app_instance,min_time,max_time,event_name,engagement_time)。 请帮忙。

 SELECT user_pseudo_id
     , MIN(event_timestamp) AS min_time
      ,MAX(event_timestamp) AS max_time
    FROM `xxx.app_events_*` as T,
       T.event_params,
       T.user_properties,
       T.event_timestamp
    where (event_name = "user_engagement" and event_params.key = "engagement_time_msec")
    and
            (user_properties.key = "access" and user_properties.value.string_value = "true") and
            PARSE_DATE('%Y%m%d', event_date) >= date_sub("{{upto_date (yyyy-mm-dd)}}", interval {{last n days}} day) and
            PARSE_DATE('%Y%m%d', event_date) <= "{{upto_date (yyyy-mm-dd)}}"
    group by 1

确实, Google Analytics for Firebase BigQuery Export中存在架构更改。 尽管旧字段与新字段相比没有明确的映射,但是为了将现有BQ数据集从旧模式迁移到新模式,文档中提供的SQL查询提供了一些关于如何更改这些字段的提示。

我在下面分享了migration_script.sql SQL查询,仅供参考,但让我为您的用例指出最相关的更改:

  • event_dim在SQL查询中被映射为事件 ,但在模式中没有任何最终表示,因为event_dim不再是嵌套字段: UNNEST(event_dim) AS event
  • event_dim.timestamp_micros映射为event_timestampevent.timestamp_micros AS event_timestamp
  • event_dim.name映射为event_nameevent.name AS event_name
  • event_param.value.int_value被映射为event_params.value.int_valueevent_param.value.int_value AS int_value
  • user_dim.user_properties映射为user_properties ,其所有嵌套值遵循相同的结构: UNNEST(user_dim.user_properties) AS user_property) AS user_properties

因此,总而言之,为了简单起见,模式更改一直专注于取消几个字段,例如,不必访问event_dim.name (这将需要取消和使查询复杂化),您可以直接查询字段event_name

考虑到这一点,我相信您将能够使您的查询适应这个新模式,并且它可能看起来更简单,因为您不必去除这么多字段。


为了澄清,让我与您分享一些比较旧模式和新模式的示例BQ查询(它们使用公共Firebase表,因此您应该能够开箱即用):

# Old Schema - UNNEST() required because there are nested fields
SELECT
  user_dim.app_info.app_instance_id,
  MIN(event.timestamp_micros) AS min_time,
  MAX(event.timestamp_micros) AS max_time,
  event.name
FROM
  `firebase-public-project.com_firebase_demo_ANDROID.app_events_20180503`,
  UNNEST(event_dim) AS event
WHERE
  event.name = "user_engagement"
GROUP BY
  user_dim.app_info.app_instance_id,
  event.name

相比于:

# New Schema - UNNEST() not required because there are no nested fields
SELECT
  user_pseudo_id,
  MIN(event_timestamp) AS min_time,
  MAX(event_timestamp) AS max_time,
  event_name
FROM
  `firebase-public-project.analytics_153293282.events_20180815`
WHERE
  event_name = "user_engagement"
GROUP BY
  user_pseudo_id,
  event_name

这些查询是等效的,但引用具有旧架构和新架构的表。 请注意,由于查询更复杂,您可能需要添加一些UNNEST()以访问表中剩余的嵌套字段。

此外,您可能希望查看这些示例这些示例可以帮助您了解如何使用新架构编写查询。


编辑2

我的理解是,如下所示的查询应该允许您查询单个语句中的所有字段。 我正在按所有非聚合/过滤字段进行分组,但根据您的使用情况(这肯定是您需要自己工作的东西),您可能想要应用不同的策略以便能够查询非-grouped字段(即使用MIN / MAX过滤器等)。

SELECT
  user_pseudo_id,
  MIN(event_timestamp) AS min_time,
  MAX(event_timestamp) AS max_time,
  event_name,
  par.value.int_value AS engagement_time
FROM
  `firebase-public-project.analytics_153293282.events_20180815`,
  UNNEST(event_params) as par
WHERE
  event_name = "user_engagement" AND par.key = "engagement_time_msec"
GROUP BY
  user_pseudo_id,
  event_name,
  par.value.int_value

附件

migration_script.sql

  SELECT
  @date AS event_date,
  event.timestamp_micros AS event_timestamp,
  event.previous_timestamp_micros AS event_previous_timestamp,
  event.name AS event_name,
  event.value_in_usd  AS event_value_in_usd,
   user_dim.bundle_info.bundle_sequence_id AS event_bundle_sequence_id,
  user_dim.bundle_info.server_timestamp_offset_micros as event_server_timestamp_offset,
  (
  SELECT
    ARRAY_AGG(STRUCT(event_param.key AS key,
        STRUCT(event_param.value.string_value AS string_value,
          event_param.value.int_value AS int_value,
          event_param.value.double_value AS double_value, 
          event_param.value.float_value AS float_value) AS value))
  FROM
    UNNEST(event.params) AS event_param) AS event_params,
  user_dim.first_open_timestamp_micros AS user_first_touch_timestamp,
  user_dim.user_id AS user_id,
  user_dim.app_info.app_instance_id AS user_pseudo_id,
  "" AS stream_id,
  user_dim.app_info.app_platform AS platform,
  STRUCT( user_dim.ltv_info.revenue AS revenue,
    user_dim.ltv_info.currency AS currency ) AS user_ltv,
  STRUCT( user_dim.traffic_source.user_acquired_campaign AS name,
      user_dim.traffic_source.user_acquired_medium AS medium,
      user_dim.traffic_source.user_acquired_source AS source ) AS traffic_source,
  STRUCT( user_dim.geo_info.continent AS continent,
    user_dim.geo_info.country AS country,
    user_dim.geo_info.region AS region,
    user_dim.geo_info.city AS city ) AS geo,
  STRUCT( user_dim.device_info.device_category AS category,
    user_dim.device_info.mobile_brand_name,
    user_dim.device_info.mobile_model_name,
    user_dim.device_info.mobile_marketing_name,
    user_dim.device_info.device_model AS mobile_os_hardware_model,
    @platform AS operating_system,
    user_dim.device_info.platform_version AS operating_system_version,
    user_dim.device_info.device_id AS vendor_id,
    user_dim.device_info.resettable_device_id AS advertising_id,
    user_dim.device_info.user_default_language AS language,
    user_dim.device_info.device_time_zone_offset_seconds AS time_zone_offset_seconds,
    IF(user_dim.device_info.limited_ad_tracking, "Yes", "No") AS is_limited_ad_tracking ) AS device,
  STRUCT( user_dim.app_info.app_id AS id,
    @firebase_app_id  AS firebase_app_id,
    user_dim.app_info.app_version AS version,
    user_dim.app_info.app_store AS install_source ) AS app_info,
  (
  SELECT
    ARRAY_AGG(STRUCT(user_property.key AS key,
        STRUCT(user_property.value.value.string_value AS string_value,
          user_property.value.value.int_value AS int_value,
          user_property.value.value.double_value AS double_value,
          user_property.value.value.float_value AS float_value,
          user_property.value.set_timestamp_usec AS set_timestamp_micros ) AS value))
  FROM
    UNNEST(user_dim.user_properties) AS user_property) AS user_properties
FROM
  `SCRIPT_GENERATED_TABLE_NAME`,
  UNNEST(event_dim) AS event

我相信我之前的回答为社区提供了一些一般性的想法 ,我会保留它并写一个新的,以便更具体地针对您的用例。

首先,我想澄清一下,为了适应查询(就像你要求我们这样做),需要清楚地理解查询的语句,目标,预期结果和数据。 由于情况并非如此,因此很难使用它,更考虑到查询中存在一些不明确的功能,例如:为了获得每个事件的“min_time”和“max_time”,您正在考虑多个事件的最小值最大值 ,这对我来说并不明确(根据您的使用情况,我可能会建议您提供更多详细信息或在查询自己)。 此外, 新模式“扁平化”事件 ,每个事件都以不同的方式写入(您可以通过运行SELECT COUNT(*) FROM 'table_with_old_schema'轻松检查这一点,并将其与SELECT COUNT(*) FROM 'table_with_new_schema'进行比较SELECT COUNT(*) FROM 'table_with_new_schema' ;你会看到第二个有更多的行),所以你的查询不再有意义,因为事件不再被分组 ,然后你不能在嵌套字段之间选择最小值和最大值。

这一点得到了澄清,并删除了一些无法直接适应新模式的字段(您可能可以从您的身边进行调整,但这需要一些额外的努力并理解这些字段对您之前的意义查询),这里有两个查询,当针对同一个表运行时,使用不同的模式提供完全相同的结果:

使用旧架构查询表:

SELECT
  user_dim.app_info.app_instance_id,
  event.name,
  params.value.int_value engagement_time
FROM
  `DATASET.app_events_YYYYMMDD`,
  UNNEST(event_dim) AS event,
  UNNEST(event.params) AS params,
  UNNEST(user_dim.user_properties) AS user_params
WHERE
  (event.name = "user_engagement"
    AND params.key = "engagement_time_msec")
  AND (user_params.key = "plays_quickplay"
    AND user_params.value.value.string_value = "true")
ORDER BY 1, 2, 3

使用新架构查询同一个表:

SELECT
  user_pseudo_id,
  event_name,
  params.value.int_value engagement_time
FROM
  `DATASET.events_YYYYMMDD`,
  UNNEST(event_params) AS params,
  UNNEST(user_properties) AS user_params
WHERE
  (event_name = "user_engagement"
    AND params.key = "engagement_time_msec")
  AND (user_params.key = "plays_quickplay"
    AND user_params.value.string_value = "true")
ORDER BY 1, 2, 3

同样,为此我使用公共数据集中的下表: firebase-public-project.com_firebase_demo_ANDROID.app_events_YYYYMMDD ,因此我不得不更改一些过滤器并删除其他过滤器,以便它能够针对该表检索合理的结果。 因此,您可以随意修改或添加所需的内容,以使其对您的用例有用。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM