简体   繁体   中英

Regexp_extract from URLs as strings (SQL BigQuery)

I'm trying to extract a string from multiple URLs that all have one thing in common even though they are built differently. Let me give you a few examples:

/cz/category/79478/productname
/https://www.store.net/de/category/49448/productname
/https://www.store.net/category/62448/productname
/category/79455/productname

I'm using BigQuery and I'm able to write a Regexp_extract clause for individual examples, however, I'm looking for one way of extracting the number (as string) after category/ , ( 79478 from the first url). All the addresses have /category/ part in common so it should be doable from my point of view.

Here's the expression that I've been trying to use:

regexp_extract(page_path, '[^category/]+/([^/]+)/')

But it doesn't work. Any idea what I'm doing wrong here?

Use a noncapture group for the leading /category/ ?

regexp_extract(page_path, '(?:/category/)([^/]+)')

Demo: https://regex101.com/r/WSIT77/1

Consider below approach

select page_path, regexp_extract(page_path, r'/category/(\d+)/') number
from your_table    

if applied to sample data in your question - output is

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM