[英]How to extract sub-directories from the URL using 'REGEXP_EXTRACT' in Data Studio
I'm trying to extract the product name from the URL between the 2 slashes using REGEXP_EXTRACT
.我正在尝试使用
REGEXP_EXTRACT
从两个斜杠之间的 URL 中提取产品名称。 For example, I want to extraxt ace-5
from the URLs below:例如,我想从以下 URL 中提取
ace-5
:
www.abc.com/products/phones/ace-5/
www.abc.com/products/phones/ace-5/
www.abc.com/products/phones/ace-5/?cid=dm66363&bidwww.abc.com/products/phones/ace-5/?cid=dm66363&bid
www.abc.com/products/phones/ace-5/?fbclid=iwar30dpnmmpwppnla7www.abc.com/products/phones/ace-5/?fbclid=iwar30dpnmmpwppnla7
www.abc.com/products/phones/ace-5/?et_cid=em_367029&et_rid=130www.abc.com/products/phones/ace-5/?et_cid=em_367029&et_rid=130
I have a RegEx to extract the Domain Name but it is not something I'm actually looking for.我有一个正则表达式来提取域名,但这不是我真正想要的。 Below is the RegEx:
下面是正则表达式:
REGEXP_EXTRACT(page,'^[^.]+.([^.]+)')
It gives the following result: abc
它给出以下结果:
abc
Assuming that the product name would always be the fixed fourth path element, we can try:假设产品名称始终是固定的第四个路径元素,我们可以尝试:
REGEXP_EXTRACT(page, '(?:[^\/]+\/){3}([^\/]+).*')
or, if the above would not work:或者,如果上述方法不起作用:
REGEXP_EXTRACT(page, '[^\/]+\/[^\/]+\/[^\/]+\/([^\/]+).*')
Here is a demo for the above:这是上面的演示:
Since I do not have the Same Page with my GDS, but I tried to recreate with my set of data source ie pages from the google analytics.由于我的 GDS 没有相同的页面,但我尝试使用我的数据源集重新创建,即来自谷歌分析的页面。
Use may use the below which will get you all the records after two slash as per your requirement.使用可以使用以下内容,根据您的要求,这将在两个斜线后为您提供所有记录。
REGEXP_EXTRACT(Page,'[^/]+/[^/]+/([^/]+)')
You need to create a calculated column with this formula, once you have created this calculated column you might need to add an additional filter to remove those with the null
value.您需要使用此公式创建一个计算列,一旦您创建了此计算列,您可能需要添加一个额外的过滤器以删除具有
null
值的那些。
example Page: "/products/phones/ace-5/" The Calculated Column value will be "ace-5"示例页面:“/products/phones/ace-5/” 计算的列值为“ace-5”
Just make sure this regex will only give you the extracted word after phones/, if you do not have any record after that it will give you null in return.只要确保这个正则表达式只会在电话/之后给你提取的单词,如果你之后没有任何记录,它会给你 null 作为回报。
The REGEXP_EXTRACT
Calculated Field below does the trick, extracting all characters after the 3rd /
till the next instance of /
:下面的
REGEXP_EXTRACT
计算字段可以解决问题,提取第三个/
之后的所有字符,直到/
的下一个实例:
REGEXP_EXTRACT(Page, "^(?:[^/]+/){3}([^/]+)")
Google Data Studio Report and a GIF to elaborate 谷歌数据洞察报告和一个 GIF 来详细说明
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.