简体   繁体   中英

How can I split string using regular expression in bigquery

I want to split the field in big query data table.

I give you sample case..

==case 1== Source Filed= "idx1-cnt1-name1,idx2-cnt2-name2... same pattern"

Result table
idx | cnt | name |
idx1,cnt1,name1
idx2,cnt2,name2,....

in query:

select                      
    regexp_extract(split_col, r'([\d]*)-') as ItemIdx,  
    regexp_extract(split_col, r'-([\d]*)-') as Cnt,
    regexp_extract(split_col, r'-([\d]*)$') as TitleIdx
From (
Select pid,now, split(source field, ',') split_col from (       
SELECT * FROM table ))

but I can't make a code this case

this case string has multiple saparator.

==case 2==

Source String = "item1-name1-type1-value1,.... same pattern"

Result Table
name | type
name1, type1
name2,type2

field count is different.

but just I need second, third field value.

how can I make a query..

I see you are using BigQuery legacy sql - so below example in it (note - it is quite recommended to use BigQuery Standard SQL whenever you can - so consider migrating) - below is simplified to make logic better readable for you so you can easily extend it to potentially more similar cases

Case 1 / example

#legacySQL
SELECT                      
    REGEXP_EXTRACT(split_col, r'^(.*?)-.*?-.*?$') AS idx,  
    REGEXP_EXTRACT(split_col, r'^.*?-(.*?)-.*?$') AS cnt,
    REGEXP_EXTRACT(split_col, r'^.*?-.*?-(.*?$)') AS name
FROM (
  SELECT SPLIT(source_field, ',') split_col 
  FROM (SELECT "idx1-cnt1-name1,idx2-cnt2-name2" source_field)
)

result:

Row idx     cnt     name     
1   idx1    cnt1    name1    
2   idx2    cnt2    name2    

Case 2 / example

#legacySQL
SELECT                      
    REGEXP_EXTRACT(split_col, r'^.*?-(.*?)-.*?') AS name,  
    REGEXP_EXTRACT(split_col, r'^.*?-.*?-(.*?)-') AS type
FROM (
  SELECT SPLIT(source_string, ',') split_col 
  FROM (SELECT "item1-name1-type1-value1, item2-name2-type2-value2" source_string)
)  

result:

Row name    type     
1   name1   type1    
2   name2   type2    

Below is example of same but for BigQuery Standard SQL (just case 2 as they really similar)

#standardSQL
WITH `project.dataset.table` AS (
  SELECT "item1-name1-type1-value1, item2-name2-type2-value2" source_string
)
SELECT 
  REGEXP_EXTRACT(split_col, r'^.*?-(.*?)-.*?') AS name,  
  REGEXP_EXTRACT(split_col, r'^.*?-.*?-(.*?)-') AS type
FROM `project.dataset.table`, UNNEST(SPLIT(source_string, ',')) split_col

obviously - same result

Row name    type     
1   name1   type1    
2   name2   type2    

Yet another option would be -

#standardSQL
WITH `project.dataset.table` AS (
  SELECT "item1-name1-type1-value1, item2-name2-type2-value2" source_string
)
SELECT 
  SPLIT(split_col, '-')[SAFE_OFFSET(1)] AS name,  
  SPLIT(split_col, '-')[SAFE_OFFSET(2)] AS type
FROM `project.dataset.table`, UNNEST(SPLIT(source_string, ',')) split_col

and so on ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM