简体   繁体   中英

Regexp expression for getting the file path till a given slash number

I have a log table with a row 'path' having values like root/home/desktop/parent/child/grandchild I want to do group by on this row based on some integer input 'n' where n is the number of slashes till which we want to extract the substring and then do group by on that. For example: in this case if n = 1 I would want to group by 'root/' if n was 3 if would want to group by 'root/home/desktop/'. How can I achieve this in BigQuery? Can I use a regex for the same or is there any better way to achieve this? Would appreciate giving a bit of explanation too for whatever the approach is. Thanks!!

Not sure if below example really requires any extra explanation

select *, 
  split(path, '/')[safe_offset(0)],
  split(path, '/')[safe_offset(1)],
  split(path, '/')[safe_offset(2)],
  split(path, '/')[safe_offset(3)],
  split(path, '/')[safe_offset(4)],
  split(path, '/')[safe_offset(5)]
from your_table    

with output

在此处输入图像描述

I would like to have the splits combined in the form of a string until the last slash...

To get partial path from the beginning - use below example

create temp function get_path(path string, n int64) as ((
  select string_agg(part, '/' order by offset)
  from unnest(split(path, '/')) part with offset
  where offset < n
));
select  
  get_path(path, 1) n1,
  get_path(path, 2) n2,
  get_path(path, 3) n3,
  get_path(path, 4) n4,
  get_path(path, 5) n5,
  get_path(path, 6) n6
from your_table

with output like below

在此处输入图像描述

In case if you want to use regexp - consider below

create temp function get_path(path string, n int64) as ((
  regexp_extract(path, r'(^(?:[^/]+/?){' || n || '})')
));
with your_table as (
  select 'root/home/desktop/parent/child/grandchild' path
)
select  
  get_path(path, 1) n1,
  get_path(path, 2) n2,
  get_path(path, 3) n3,
  get_path(path, 4) n4,
  get_path(path, 5) n5,
  get_path(path, 6) n6,
from your_table    

with output

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM