简体   繁体   English

如何在 Data Studio 中使用“REGEXP_EXTRACT”从 URL 中提取子目录

[英]How to extract sub-directories from the URL using 'REGEXP_EXTRACT' in Data Studio

I'm trying to extract the product name from the URL between the 2 slashes using REGEXP_EXTRACT .我正在尝试使用REGEXP_EXTRACT从两个斜杠之间的 URL 中提取产品名称。 For example, I want to extraxt ace-5 from the URLs below:例如,我想从以下 URL 中提取ace-5

www.abc.com/products/phones/ace-5/ www.abc.com/products/phones/ace-5/
www.abc.com/products/phones/ace-5/?cid=dm66363&bid www.abc.com/products/phones/ace-5/?cid=dm66363&bid
www.abc.com/products/phones/ace-5/?fbclid=iwar30dpnmmpwppnla7 www.abc.com/products/phones/ace-5/?fbclid=iwar30dpnmmpwppnla7
www.abc.com/products/phones/ace-5/?et_cid=em_367029&et_rid=130 www.abc.com/products/phones/ace-5/?et_cid=em_367029&et_rid=130

I have a RegEx to extract the Domain Name but it is not something I'm actually looking for.我有一个正则表达式来提取域名,但这不是我真正想要的。 Below is the RegEx:下面是正则表达式:

REGEXP_EXTRACT(page,'^[^.]+.([^.]+)')

It gives the following result: abc它给出以下结果: abc

Assuming that the product name would always be the fixed fourth path element, we can try:假设产品名称始终是固定的第四个路径元素,我们可以尝试:

REGEXP_EXTRACT(page, '(?:[^\/]+\/){3}([^\/]+).*')

or, if the above would not work:或者,如果上述方法不起作用:

REGEXP_EXTRACT(page, '[^\/]+\/[^\/]+\/[^\/]+\/([^\/]+).*')

Here is a demo for the above:这是上面的演示:

Demo演示

Since I do not have the Same Page with my GDS, but I tried to recreate with my set of data source ie pages from the google analytics.由于我的 GDS 没有相同的页面,但我尝试使用我的数据源集重新创建,即来自谷歌分析的页面。

Use may use the below which will get you all the records after two slash as per your requirement.使用可以使用以下内容,根据您的要求,这将在两个斜线后为您提供所有记录。

REGEXP_EXTRACT(Page,'[^/]+/[^/]+/([^/]+)')

You need to create a calculated column with this formula, once you have created this calculated column you might need to add an additional filter to remove those with the null value.您需要使用此公式创建一个计算列,一旦您创建了此计算列,您可能需要添加一个额外的过滤器以删除具有null值的那些。

example Page: "/products/phones/ace-5/" The Calculated Column value will be "ace-5"示例页面:“/products/phones/ace-5/” 计算的列值为“ace-5”

Just make sure this regex will only give you the extracted word after phones/, if you do not have any record after that it will give you null in return.只要确保这个正则表达式只会在电话/之后给你提取的单词,如果你之后没有任何记录,它会给你 null 作为回报。

The REGEXP_EXTRACT Calculated Field below does the trick, extracting all characters after the 3rd / till the next instance of / :下面的REGEXP_EXTRACT计算字段可以解决问题,提取第三个/之后的所有字符,直到/的下一个实例:

REGEXP_EXTRACT(Page, "^(?:[^/]+/){3}([^/]+)")

Google Data Studio Report and a GIF to elaborate 谷歌数据洞察报告和一个 GIF 来详细说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM