简体   繁体   English

GA4流量源数据与bigquery不匹配

[英]GA4 traffic source data do not match with bigquery

I have try to export traffic source data and event attribtion from bigquery and match with GA4 (session_source and session_medium) I am extract the event params (source ad medium) from bigquery but have a big gap between two data source我尝试从 bigquery 导出流量源数据和事件归因并与 GA4(session_source 和 session_medium)匹配 我从 bigquery 中提取事件参数(源广告媒体)但两个数据源之间存在很大差距

Any solution to solve it?有解决办法吗?

I have try to use use below SQL我尝试在 SQL 下面使用


with prep as (
select
    user_pseudo_id,
    (select value.int_value from unnest(event_params) where key = 'ga_session_id') as session_id,
    max((select value.string_value from unnest(event_params) where key = 'source')) as source,
    max((select value.string_value from unnest(event_params) where key = 'medium')) as medium,
    max((select value.string_value from unnest(event_params) where key = 'name')) as campaign,
    max((select value.string_value from unnest(event_params) where key = 'term')) as term,
    max((select value.string_value from unnest(event_params) where key = 'content')) as coXXntent,
    platform,
FROM `XXX` 
group by
    user_pseudo_id,
    session_id,
    platform
)

select
    -- session medium (dimension | the value of a medium associated with a session)
    platform,
    coalesce(source,'(none)') as source_session,
    coalesce(medium,'(none)') as medium_session,
    coalesce(campaign,'(none)') as campaign_session,
    coalesce(content,'(none)') as content,
    coalesce(term,'(none)') as term,
    count(distinct concat(user_pseudo_id,session_id)) as sessions
from
    prep
group by
    platform,
    source_session,
    medium_session,
    campaign_session,
    content,
    term
order by
    sessions desc

I'm also trying to figure out why BigQuery can't correctly match the source and medium to the event.我也在尝试弄清楚为什么 BigQuery 无法正确匹配事件的来源和媒介。 The issue I found is that it assigns the source/medium as google/organic even though there is a gclid parameter in the link.我发现的问题是,即使链接中有 gclid 参数,它也会将源/媒体指定为 google/organic。 The second issue is the huge deficiencies in recognizing the source as direct - in such cases I do not have these parameters for events at all.第二个问题是在将来源识别为直接来源方面存在巨大缺陷——在这种情况下,我根本没有这些事件参数。

The values are valid, but only for the source and medium that acquired the user.这些值是有效的,但仅适用于获取用户的来源和媒体。

As I compare data in UA and GA4 session attribution is correct.当我比较 UA 和 GA4 中的数据时,会话归因是正确的。 So it looks like a problem when exporting to BigQuery.所以在导出到 BigQuery 时看起来像是一个问题。 I reported this to support and am waiting for a response.我将此事报告给支持部门,正在等待回复。

I have also noticed source/medium does not align between BigQuery and GA4 and like Justyna has commented a lot of my source/medium come through as google/organic even when they are not.我还注意到来源/媒体在 BigQuery 和 GA4 之间不一致,就像 Justyna 评论的那样,我的很多来源/媒体都是通过 google/organic 获得的,即使它们不是。 I am hoping Justyna will post here when there is a solution.我希望 Justyna 在有解决方案时会在这里发布。

Looking at your code I can see 2 other areas that would cause discrepancies查看您的代码,我可以看到另外 2 个会导致差异的区域

1) 1)

count(distinct concat(user_pseudo_id,session_id)) as sessions

This will only capture events with a valid pseudo_id and session_id, this is the correct way to count, but in my data there tends to be a few events without the ids are null so your session count included them but GA4 does.so use your preferred method of counting nulls to work out if this is an issue for you.这将只捕获具有有效 pseudo_id 和 session_id 的事件,这是正确的计数方式,但在我的数据中往往有一些没有 id 的事件为空,所以你的会话计数包括它们但 GA4 确实如此。所以使用你的首选如果这对您来说是个问题,则计算空值的方法。

2): You are also doing an exact count which again is correct but GA4 does an approximant match see link below for details. 2): 你也在做一个精确的计数,这也是正确的,但 GA4 做了一个近似匹配,详情见下面的链接。

https://developers.google.com/analytics/blog/2022/hll#using_bigquery_hll_functions_with_google_analytics_event_data https://developers.google.com/analytics/blog/2022/hll#using_bigquery_hll_functions_with_google_analytics_event_data

Using the above two techniques I can get a lot closer to the GA4 number of session but they are still not attributed correctly使用以上两种技术,我可以更接近 GA4 会话数,但它们仍然没有正确归因

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 BigQuery 中提取 GA4 流量 session 而不是用户获取流量? - How to extract GA4 traffic session from BigQuery instead of user acquisition traffic? Bigquery 中的结果与 GA4 中的结果不同 - Results within Bigquery do not remain the same as in GA4 通过 oauth2(同意屏幕)从 Google Analytics Data API (Ga4) 中提取数据 - Extract data from Google Analytics Data API (Ga4) via oauth2 (consent screen) 有谁有将 GA4 日志数据加载到 Supabase 数据库的经验吗? - Does anyone who has experience with loading GA4 log data to Supabase database? GTM + GA4:iOS 应用未使用 GTM 向 GA4 发送事件 - GTM + GA4: iOS app not sending events to GA4 using GTM 如何在 BigQuery 中随机匹配元素 - How Do I Randomly Match Elements in BigQuery 为什么 GA4 产品链接停止工作? - Why did GA4 product link stopped working? 如何使用 Big Query 计算 GA4 的“平均参与时间”? - How to calculate the "average engagement time" of GA4 using Big Query? GA4 和 firebase:自定义事件在 tablayout 中触发两次(android kotlin) - GA4 and firebase: custom event firing twice in tablayout(android kotlin) 使用 PySpark 从 BigQuery 读取和写入数据:错误 `Failed to find data source: bigquery` - Reading and writing data from BigQuery, using PySpark: ERROR `Failed to find data source: bigquery`
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM