简体   繁体   English

使用 BigQuery SQL 查找部分字符串并在分隔符之间提取数据

[英]Finding part of string and extracting data between delimiter using BigQuery SQL

I have a column like this:我有一个这样的专栏:

String_to_Extract String_to_Extract
A~S1_B~S2_C~S11 A~S1_B~S2_C~S11
A~S1_B~S3_C~S12 A~S1_B~S3_C~S12
C~S13_A~S11_B~S4 C~S13_A~S11_B~S4

The part before the "~" should be the column name. “~”之前的部分应该是列名。 The part after the "~" should be the row value. “~”后面的部分应该是行值。 This is separated by a "_".这由“_”分隔。 Therefore, the result should look like this:因此,结果应如下所示:

String_to_Extract String_to_Extract A一个 B C C
A~S1_B~S2_C~S11 A~S1_B~S2_C~S11 S1 S1 S2 S2 S11 S11
A~S1_B~S3_C~S12 A~S1_B~S3_C~S12 S1 S1 S3 S3 S12 S12
C~S13_A~S11_B~S4 C~S13_A~S11_B~S4 S11 S11 S4 S4 S13 S13

Here is my approach:这是我的方法:

SELECT
String_to_Extract,
SUBSTRING(String_to_Extract, INSTR(Advertiser, "A~")+2, ?) AS A,
SUBSTRING(String_to_Extract, INSTR(Advertiser, "B~")+2, ?) AS B,
SUBSTRING(String_to_Extract, INSTR(Advertiser, "C~")+2, ?) AS C,
From Table

How do I get the part between the ~ and next _ for each column?如何获得每列的 ~ 和下一个 _ 之间的部分?

Would be glad about help!很高兴得到帮助!

One approach uses REGEXP_EXTRACT :一种方法使用REGEXP_EXTRACT

SELECT
    REGEXP_EXTRACT(String_to_Extract, r"(?:^|_)A~([^_]+)") AS A,
    REGEXP_EXTRACT(String_to_Extract, r"(?:^|_)B~([^_]+)") AS B,
    REGEXP_EXTRACT(String_to_Extract, r"(?:^|_)C~([^~]+)") AS C
FROM yourTable;

Consider below approach (BigQuery)考虑以下方法(BigQuery)

select * from (
  select String_to_Extract, col_val[offset(0)] as col, col_val[offset(1)] as val
  from your_table, unnest(split(String_to_Extract, '_')) kv,
  unnest([struct(split(kv, '~') as col_val)])
)
pivot (any_value(val) for col in ('A', 'B', 'C'))   

If applied to sample data in your question - output is如果应用于您问题中的示例数据 - output 是

在此处输入图像描述

You can also use this approach which orders the splitted item first and then picks the values:您也可以使用这种方法,首先订购拆分的项目,然后选择值:


select 
   split(ordered[safe_offset(0)], '~')[safe_offset(1)] as A,
   split(ordered[safe_offset(1)], '~')[safe_offset(1)] as B,
   split(ordered[safe_offset(2)], '~')[safe_offset(1)] as C
 from (
    select 
        array(select _ from unnest(split(Advertiser, '_') ) as _ order by 1) as ordered
    from dataset.table
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM