简体   繁体   English

在 Redshift 中使用活动跟踪参数捕获 URL 时遇到问题

[英]Trouble Capturing URLs With Campaign Tracking Parameters in Redshift

I am trying to capture URLs that have tracking parameters in a query to a website's homepage.我正在尝试捕获在对网站主页的查询中具有跟踪参数的 URL。 There are cases where the parameter can have a forward slash before the query begins.在某些情况下,在查询开始之前,参数可以有一个正斜杠。 Here are two examples that should match:以下是两个应该匹配的示例:

https://test.com/?utm_campaign=email
https://test.com?utm_campaign=email

Here are two examples that should not match:以下是不应该匹配的两个示例:

 https://test.com/blog
 https://test.com/blog?utm_campaign=email

Here is an example query:这是一个示例查询:

SELECT t.url,COUNT(t.id) AS pageviews
FROM db.table AS t
WHERE t.url ~ '^https*:\\/\\/test\\.com\\?'
GROUP BY 1
ORDER BY 2 DESC

Note that Redshift documentation states:请注意,Redshift 文档指出:

To search for strings that include metacharacters, such as '.搜索包含元字符的字符串,例如 '. * | * | ? ? ', and so on, escape the character using two backslashes (' \\\\ ') ',依此类推,使用两个反斜杠 (' \\\\ ') 对字符进行转义

I have tried both single and double slashes.我试过单斜线和双斜线。 The single slash returns a lot more than I expect, whereas the double slash does not return any results.单斜线返回的结果比我预期的要多得多,而双斜线不返回任何结果。 I'm more accustomed to writing regex in Javascript, and as such I assume I'm having trouble translating between the two;我更习惯于用 Javascript 编写正则表达式,因此我认为我在两者之间进行翻译时遇到了麻烦; any help is much appreciated.任何帮助深表感谢。

The / symbol is not a special regex metacharacter, you should not escape it. /符号不是特殊的正则表达式元字符,您不应对其进行转义。 Besides, in order to avoid issues with escaping .此外,为了避免转义问题. or ?或者? , you may put them into bracket expressions: ,您可以将它们放入括号表达式中:

WHERE t.url ~ '^https?://test[.]com[?]'

It will match:它将匹配:

  • ^ - start of string ^ - 字符串的开始
  • https?://test[.]com[?] - http://test.com? https?://test[.]com[?] - http://test.com? or https://test.com?https://test.com? . .

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM