简体   繁体   中英

How to use regex to extract string in SQL?

Documentation says:

regexp_extract(string, pattern)varchar Returns the first substring matched by the regular expression pattern in string https://prestodb.io/docs/current/functions/regexp.html

I have the following query

select regexp_extract('sssshttps://jira.domain.com/browse/PR-6835hhhh',
'/(https.*\/browse\/)(\w+\-\d+)/g')

This return null when it should return https://jira.domain.com/browse/PR-6835 The regexp seems OK it works in many regexp tools that I checked. Why I can't extract the substring of the url link?

Note that forward slash and dash are not regex metacharacters, and so do not need to be escaped, at least not in Presto regex. Consider this version:

select regexp_extract('sssshttps://jira.domain.com/browse/PR-6835hhhh',
                      'https.*?/browse/\w+-\d+')

This should return:

https://jira.domain.com/browse/PR-6835

Your current regex appears to be coming from some other language, such as JavaScript or PHP, where the regex literal has a delimiter / , therefore requiring that / needs to be escaped.

I just edited my answer to use lazy dot .*? when matching content between https and the first /browse . This should fix the edge case mentioned in your comment where .* is matching across multiple URLs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM