简体   繁体   English

使用mysql REGEXP_SUBSTR 在两个字符或字符串之间提取字符串

[英]Extracting string between two characters or string using mysql REGEXP_SUBSTR

I need help in extracting the following string.我需要帮助来提取以下字符串。 I have tried many solutions but this one is the closest.我尝试了很多解决方案,但这是最接近的。 But still not what I require.但仍然不是我所需要的。 Any help is appreciated.任何帮助表示赞赏。

Sample URL: 'https://mywebsite/path/?utm_source=google&utm_medium=cpc&gclid=123abc'示例网址: 'https://mywebsite/path/?utm_source=google&utm_medium=cpc&gclid=123abc'

Required Result:要求的结果:

utm_source utm_source utm_medium utm_medium gclid gclid
google谷歌 cpc每次点击费用 123abc 123abc

The following example for gclid gives me gclid=123abc as a result, while I require to extract 123abc以下 gclid 示例为我提供了gclid=123abc ,而我需要提取123abc

SELECT l.url, REGEXP_SUBSTR(l.url, 'gclid=([^&]*)') as data
FROM mydatabase.mytable AS l
WHERE Date(l.registration_date) >= '2021-06-15'
AND REGEXP_SUBSTR(l.url, 'gclid=([^&]*)') is not null

I need to parse the other two fields also like utm_source and utm_medium.我需要解析另外两个字段,比如 utm_source 和 utm_medium。

SET @URL := 'https://mywebsite/path/?utm_source=google&utm_medium=cpc&gclid=12345';
 SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(@URL, 'utm_source=', -1), '&', 1) utm_source, SUBSTRING_INDEX(SUBSTRING_INDEX(@URL, 'utm_medium=', -1), '&', 1) utm_medium, SUBSTRING_INDEX(SUBSTRING_INDEX(@URL, 'gclid=', -1), '&', 1) gclid; 
utm_source utm_source utm_medium utm_medium gclid gclid
google谷歌 cpc每次点击费用 12345 12345

db<>fiddle here db<> 在这里摆弄

You can use lookbehinds here to exclude the static text from your matches:您可以在此处使用lookbehinds从匹配项中排除静态文本:

REGEXP_SUBSTR(l.url, '(?<=[?&]gclid=)[^&#]+')
REGEXP_SUBSTR(l.url, '(?<=[?&]utm_source=)[^&#]+')
REGEXP_SUBSTR(l.url, '(?<=[?&]utm_medium=)[^&#]+') 

See a sample regex demo .请参阅示例正则表达式演示

Details :详情

  • (?<=[?&]gclid=) - a positive lookbehind that matches a location that is immediately preceded with ? (?<=[?&]gclid=) - 匹配紧跟在? or & (this makes sure we only match the full query param key) and then gclid=& (这确保我们只匹配完整的查询参数键)然后gclid=
  • [^&#]+ - one or more chars other than & and # (consumed and returned as a match value). [^&#]+ - 除&#之外的一个或多个字符(使用并作为匹配值返回)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM