
[英]How do I migrate data from my old schema database to New schema with different database connection
[英]how to write Bigquery in new schema with replacing event_dim in old schema from Firebase analytics?
旧的BigQuery Export架构明智的脚本正在运行。它在下面给出。 但是我希望复制此代码并根据新的导出模式编写它,因为我们更改了Bigquery模式。 请帮助因为新的BigQuery Export模式我没有找到任何针对event_dim的其他相应记录(event_dim是根据旧的BigQuery Export模式)。
以下是BigQuery Export架构的链接: 点击此处
SELECT user_dim.app_info.app_instance_id
, (SELECT MIN(timestamp_micros) FROM UNNEST(event_dim)) min_time
, (SELECT MAX(timestamp_micros) FROM UNNEST(event_dim)) max_time,
event.name,
params.value.int_value engagement_time
FROM `xxx.app_events_*`,
UNNEST(event_dim) as event,
UNNEST(event.params) as params,
UNNEST(user_dim.user_properties) as user_params
where (event.name = "user_engagement" and params.key = "engagement_time_msec")
and
(user_params.key = "access" and user_params.value.value.string_value = "true") and
PARSE_DATE('%Y%m%d', event.date) >= date_sub("{{upto_date (yyyy-mm-dd)}}", interval {{last n days}} day) and
PARSE_DATE('%Y%m%d', event.date) <= "{{upto_date (yyyy-mm-dd)}}"
尝试下面的查询,但我想要一个SELECT语句的app_instance,min_time,max_time,event_name,engagement_time。 当我使用'group by'时,我无法一次获得所有这些(app_instance,min_time,max_time,event_name,engagement_time)。 请帮忙。
SELECT user_pseudo_id
, MIN(event_timestamp) AS min_time
,MAX(event_timestamp) AS max_time
FROM `xxx.app_events_*` as T,
T.event_params,
T.user_properties,
T.event_timestamp
where (event_name = "user_engagement" and event_params.key = "engagement_time_msec")
and
(user_properties.key = "access" and user_properties.value.string_value = "true") and
PARSE_DATE('%Y%m%d', event_date) >= date_sub("{{upto_date (yyyy-mm-dd)}}", interval {{last n days}} day) and
PARSE_DATE('%Y%m%d', event_date) <= "{{upto_date (yyyy-mm-dd)}}"
group by 1
确实, Google Analytics for Firebase BigQuery Export中存在架构更改。 尽管旧字段与新字段相比没有明确的映射,但是为了将现有BQ数据集从旧模式迁移到新模式,文档中提供的SQL查询提供了一些关于如何更改这些字段的提示。
我在下面分享了migration_script.sql
SQL查询,仅供参考,但让我为您的用例指出最相关的更改:
UNNEST(event_dim) AS event
event.timestamp_micros AS event_timestamp
event.name AS event_name
event_param.value.int_value AS int_value
UNNEST(user_dim.user_properties) AS user_property) AS user_properties
因此,总而言之,为了简单起见,模式更改一直专注于取消几个字段,例如,不必访问event_dim.name
(这将需要取消和使查询复杂化),您可以直接查询字段event_name
。
考虑到这一点,我相信您将能够使您的查询适应这个新模式,并且它可能看起来更简单,因为您不必去除这么多字段。
为了澄清,让我与您分享一些比较旧模式和新模式的示例BQ查询(它们使用公共Firebase表,因此您应该能够开箱即用):
# Old Schema - UNNEST() required because there are nested fields
SELECT
user_dim.app_info.app_instance_id,
MIN(event.timestamp_micros) AS min_time,
MAX(event.timestamp_micros) AS max_time,
event.name
FROM
`firebase-public-project.com_firebase_demo_ANDROID.app_events_20180503`,
UNNEST(event_dim) AS event
WHERE
event.name = "user_engagement"
GROUP BY
user_dim.app_info.app_instance_id,
event.name
相比于:
# New Schema - UNNEST() not required because there are no nested fields
SELECT
user_pseudo_id,
MIN(event_timestamp) AS min_time,
MAX(event_timestamp) AS max_time,
event_name
FROM
`firebase-public-project.analytics_153293282.events_20180815`
WHERE
event_name = "user_engagement"
GROUP BY
user_pseudo_id,
event_name
这些查询是等效的,但引用具有旧架构和新架构的表。 请注意,由于查询更复杂,您可能需要添加一些UNNEST()以访问表中剩余的嵌套字段。
此外,您可能希望查看这些示例 , 这些示例可以帮助您了解如何使用新架构编写查询。
编辑2
我的理解是,如下所示的查询应该允许您查询单个语句中的所有字段。 我正在按所有非聚合/过滤字段进行分组,但根据您的使用情况(这肯定是您需要自己工作的东西),您可能想要应用不同的策略以便能够查询非-grouped字段(即使用MIN / MAX过滤器等)。
SELECT
user_pseudo_id,
MIN(event_timestamp) AS min_time,
MAX(event_timestamp) AS max_time,
event_name,
par.value.int_value AS engagement_time
FROM
`firebase-public-project.analytics_153293282.events_20180815`,
UNNEST(event_params) as par
WHERE
event_name = "user_engagement" AND par.key = "engagement_time_msec"
GROUP BY
user_pseudo_id,
event_name,
par.value.int_value
附件
migration_script.sql
:
SELECT
@date AS event_date,
event.timestamp_micros AS event_timestamp,
event.previous_timestamp_micros AS event_previous_timestamp,
event.name AS event_name,
event.value_in_usd AS event_value_in_usd,
user_dim.bundle_info.bundle_sequence_id AS event_bundle_sequence_id,
user_dim.bundle_info.server_timestamp_offset_micros as event_server_timestamp_offset,
(
SELECT
ARRAY_AGG(STRUCT(event_param.key AS key,
STRUCT(event_param.value.string_value AS string_value,
event_param.value.int_value AS int_value,
event_param.value.double_value AS double_value,
event_param.value.float_value AS float_value) AS value))
FROM
UNNEST(event.params) AS event_param) AS event_params,
user_dim.first_open_timestamp_micros AS user_first_touch_timestamp,
user_dim.user_id AS user_id,
user_dim.app_info.app_instance_id AS user_pseudo_id,
"" AS stream_id,
user_dim.app_info.app_platform AS platform,
STRUCT( user_dim.ltv_info.revenue AS revenue,
user_dim.ltv_info.currency AS currency ) AS user_ltv,
STRUCT( user_dim.traffic_source.user_acquired_campaign AS name,
user_dim.traffic_source.user_acquired_medium AS medium,
user_dim.traffic_source.user_acquired_source AS source ) AS traffic_source,
STRUCT( user_dim.geo_info.continent AS continent,
user_dim.geo_info.country AS country,
user_dim.geo_info.region AS region,
user_dim.geo_info.city AS city ) AS geo,
STRUCT( user_dim.device_info.device_category AS category,
user_dim.device_info.mobile_brand_name,
user_dim.device_info.mobile_model_name,
user_dim.device_info.mobile_marketing_name,
user_dim.device_info.device_model AS mobile_os_hardware_model,
@platform AS operating_system,
user_dim.device_info.platform_version AS operating_system_version,
user_dim.device_info.device_id AS vendor_id,
user_dim.device_info.resettable_device_id AS advertising_id,
user_dim.device_info.user_default_language AS language,
user_dim.device_info.device_time_zone_offset_seconds AS time_zone_offset_seconds,
IF(user_dim.device_info.limited_ad_tracking, "Yes", "No") AS is_limited_ad_tracking ) AS device,
STRUCT( user_dim.app_info.app_id AS id,
@firebase_app_id AS firebase_app_id,
user_dim.app_info.app_version AS version,
user_dim.app_info.app_store AS install_source ) AS app_info,
(
SELECT
ARRAY_AGG(STRUCT(user_property.key AS key,
STRUCT(user_property.value.value.string_value AS string_value,
user_property.value.value.int_value AS int_value,
user_property.value.value.double_value AS double_value,
user_property.value.value.float_value AS float_value,
user_property.value.set_timestamp_usec AS set_timestamp_micros ) AS value))
FROM
UNNEST(user_dim.user_properties) AS user_property) AS user_properties
FROM
`SCRIPT_GENERATED_TABLE_NAME`,
UNNEST(event_dim) AS event
我相信我之前的回答为社区提供了一些一般性的想法 ,我会保留它并写一个新的,以便更具体地针对您的用例。
首先,我想澄清一下,为了适应查询(就像你要求我们这样做),需要清楚地理解查询的语句,目标,预期结果和数据。 。 由于情况并非如此,因此很难使用它,更考虑到查询中存在一些不明确的功能,例如:为了获得每个事件的“min_time”和“max_time”,您正在考虑多个事件的最小值和最大值 ,这对我来说并不明确(根据您的使用情况,我可能会建议您提供更多详细信息或在查询自己)。 此外, 新模式“扁平化”事件 ,每个事件都以不同的方式写入(您可以通过运行SELECT COUNT(*) FROM 'table_with_old_schema'
轻松检查这一点,并将其与SELECT COUNT(*) FROM 'table_with_new_schema'
进行比较SELECT COUNT(*) FROM 'table_with_new_schema'
;你会看到第二个有更多的行),所以你的查询不再有意义,因为事件不再被分组 ,然后你不能在嵌套字段之间选择最小值和最大值。
这一点得到了澄清,并删除了一些无法直接适应新模式的字段(您可能可以从您的身边进行调整,但这需要一些额外的努力并理解这些字段对您之前的意义查询),这里有两个查询,当针对同一个表运行时,使用不同的模式提供完全相同的结果:
使用旧架构查询表:
SELECT
user_dim.app_info.app_instance_id,
event.name,
params.value.int_value engagement_time
FROM
`DATASET.app_events_YYYYMMDD`,
UNNEST(event_dim) AS event,
UNNEST(event.params) AS params,
UNNEST(user_dim.user_properties) AS user_params
WHERE
(event.name = "user_engagement"
AND params.key = "engagement_time_msec")
AND (user_params.key = "plays_quickplay"
AND user_params.value.value.string_value = "true")
ORDER BY 1, 2, 3
使用新架构查询同一个表:
SELECT
user_pseudo_id,
event_name,
params.value.int_value engagement_time
FROM
`DATASET.events_YYYYMMDD`,
UNNEST(event_params) AS params,
UNNEST(user_properties) AS user_params
WHERE
(event_name = "user_engagement"
AND params.key = "engagement_time_msec")
AND (user_params.key = "plays_quickplay"
AND user_params.value.string_value = "true")
ORDER BY 1, 2, 3
同样,为此我使用公共数据集中的下表: firebase-public-project.com_firebase_demo_ANDROID.app_events_YYYYMMDD
,因此我不得不更改一些过滤器并删除其他过滤器,以便它能够针对该表检索合理的结果。 因此,您可以随意修改或添加所需的内容,以使其对您的用例有用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.