[英]Trouble Capturing URLs With Campaign Tracking Parameters in Redshift
I am trying to capture URLs that have tracking parameters in a query to a website's homepage.我正在尝试捕获在对网站主页的查询中具有跟踪参数的 URL。 There are cases where the parameter can have a forward slash before the query begins.
在某些情况下,在查询开始之前,参数可以有一个正斜杠。 Here are two examples that should match:
以下是两个应该匹配的示例:
https://test.com/?utm_campaign=email
https://test.com?utm_campaign=email
Here are two examples that should not match:以下是不应该匹配的两个示例:
https://test.com/blog
https://test.com/blog?utm_campaign=email
Here is an example query:这是一个示例查询:
SELECT t.url,COUNT(t.id) AS pageviews
FROM db.table AS t
WHERE t.url ~ '^https*:\\/\\/test\\.com\\?'
GROUP BY 1
ORDER BY 2 DESC
Note that Redshift documentation states:请注意,Redshift 文档指出:
To search for strings that include metacharacters, such as '.
搜索包含元字符的字符串,例如 '. * |
* | ?
? ', and so on, escape the character using two backslashes (' \\\\ ')
',依此类推,使用两个反斜杠 (' \\\\ ') 对字符进行转义
I have tried both single and double slashes.我试过单斜线和双斜线。 The single slash returns a lot more than I expect, whereas the double slash does not return any results.
单斜线返回的结果比我预期的要多得多,而双斜线不返回任何结果。 I'm more accustomed to writing regex in Javascript, and as such I assume I'm having trouble translating between the two;
我更习惯于用 Javascript 编写正则表达式,因此我认为我在两者之间进行翻译时遇到了麻烦; any help is much appreciated.
任何帮助深表感谢。
The /
symbol is not a special regex metacharacter, you should not escape it. /
符号不是特殊的正则表达式元字符,您不应对其进行转义。 Besides, in order to avoid issues with escaping .
此外,为了避免转义问题
.
or ?
或者
?
, you may put them into bracket expressions: ,您可以将它们放入括号表达式中:
WHERE t.url ~ '^https?://test[.]com[?]'
It will match:它将匹配:
^
- start of string ^
- 字符串的开始https?://test[.]com[?]
- http://test.com?
https?://test[.]com[?]
- http://test.com?
or https://test.com?
https://test.com?
.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.