简体   繁体   中英

How can I avoid duplicates using UNNEST and SPLIT in BigQuery SQL?

I have the following data

Id Historical_UTMs
1 a,b,c,d;e,f,g,h;
2 i,j,k,l;
3 m,n,o,p;q,r,s,t;u,v,w,x;

And I want to end with the following

Id utm_Type utm_Timestamp utm_Web_Page utm_Referrer
1 a b c d
1 e f g h
2 i j k l
3 m n o p
3 q r s t
3 u v w x

I want to split the content of the Historical_UTMs field into different rows (delimited by the;) all keeping the Id field, and also want to split up each of the values within the new row (delimited by,).

I have the following script that creates a table with the correct information. The problem is that all the records are duplicated.

Is there anyone that can help me understand why this script is creating duplicate rows, and how to fix it?

with Expanded as (
  select 
    Lead.Id,
    Lead.Historical_UTMs
  from
    `dataset.GS_UTMs` AS Lead,
    unnest(split(Historical_UTMs,';')) AS History_UTMs
)

select
  Expanded.Id,
  split(Expanded.Historical_UTMs,',')[safe_offset(0)] as utm_Type,
  split(Expanded.Historical_UTMs,',')[safe_offset(1)] as utm_Timestamp,
  split(Expanded.Historical_UTMs,',')[safe_offset(2)] as utm_Web_Page,
  split(Expanded.Historical_UTMs,',')[safe_offset(3)] as utm_Referrer,

from
  Expanded

Consider below

select Id, 
  UTM[offset(0)] as utm_Type,
  UTM[offset(1)] as utm_Timestamp,
  UTM[offset(2)] as utm_Web_Page,
  UTM[offset(3)] as utm_Referrer
from `project.dataset.GS_UTMs`,
unnest(split(trim(Historical_UTMs, ';'), ';')) Historical_UTM,
unnest([struct(split(Historical_UTM) as UTM)])        

if applied to sample data in your question - output is

在此处输入图像描述

If I understand correctly, the issue is that historical_utms has multiple meanings in the CTE and you are using the wrong one. Perhaps something like this will work:

with Expanded as (
      select l.Id, Historical_UTM
      from `stormgeo-bigquery.Data_to_send_to_BigQuery_from_Google_Sheet.GS_UTMs` l cross join
           unnest(split(Historical_UTMs,';')) AS History_UTM
          )
select e.Id,
       split(e.Historical_UTM, ',')[safe_offset(0)] as utm_Type,
       split(e.Historical_UTM, ',')[safe_offset(1)] as utm_Timestamp,
       split(e.Historical_UTM, ',')[safe_offset(9)] as utm_Web_Page,
       split(e.Historical_UTM, ',')[safe_offset(10)] as utm_Referrer
from Expanded e;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM