[英]How to extract file name from URL?
I have file names in a URL and want to strip out the preceding URL and filepath as well as the version that appears after the ?
我在 URL 中有文件名,想去掉前面的 URL 和文件路径以及出现在?
Trying to use RegEx to pull, CaptialForecasting_Datasheet.pdf
尝试使用 RegEx 拉取, CaptialForecasting_Datasheet.pdf
The REGEXP_EXTRACT
in Google Data Studio seems unique. Google Data Studio 中的REGEXP_EXTRACT
似乎是独一无二的。 Tried the suggestion but kept getting " could not parse " error.尝试了该建议,但不断收到“无法解析”错误。 I was able to strip out the first part of the url with the following.我能够使用以下内容删除 url 的第一部分。 Event Label
is where I store URL of downloaded PDF. Event Label
是我存储下载的 PDF 的 URL 的地方。
The URL:网址:
https://www.dudesolutions.com/Portals/0/Documents/HC_Brochure_Digital.pdf?ver=2018-03-18-110927-033
REGEXP_EXTRACT( Event Label , 'Documents/([^&]+)' )
The result:结果:
HC_Brochure_Digital.pdf?ver=2018-03-18-110927-033
Now trying to determine how do I pull out everything after the ?
现在试图确定如何在?
where the version data is, so as to extract just the Filename.pdf
.版本数据在哪里,以便只提取Filename.pdf
。
You could try:你可以试试:
This will match CaptialForecasting_Datasheet.pdf
even if there is a question mark in the path.即使路径中有问号,这也将匹配CaptialForecasting_Datasheet.pdf
。 For example, the regex will succeed in both of these cases:例如,正则表达式在这两种情况下都会成功:
https://www.dudesolutions.com/somepath/CaptialForecasting_Datasheet.pdf?ver
https://www.dudesolutions.com/somepath?/CaptialForecasting_Datasheet.pdf?ver
Assuming that the name appears right after the last /
and ends with the ?
假设名称紧跟在最后一个/
并以?
, the regular expression below will leave the name in group 1 where you can get it with \\1
or whatever the tool that you are using supports. ,下面的正则表达式会将名称留在组 1 中,您可以在其中使用\\1
或您使用的任何工具支持来获取它。
.*\/(.*)\?
It basically says: get everything in between the last /
and the first ?
它基本上是说:在最后一个/
和第一个之间获取所有内容?
after, and put it in group 1.之后,将其放入第 1 组。
Another regular expression that only matches the file name that you want but is more complex is:另一个只匹配您想要的文件名但更复杂的正则表达式是:
(?<=\/)[^\/]*(?=\?)
It matches all non- /
characters, [^\\/]
, immediately preceded by /
, (?<=\\/)
and immediately followed by ?
它匹配所有非/
字符, [^\\/]
,紧跟在/
, (?<=\\/)
之后紧跟?
, (?=\\?)
. , (?=\\?)
。 The first parentheses is a positive lookbehind, and the second expression in parentheses is a positive lookahead.第一个括号是正向后视,括号中的第二个表达式是正向前瞻。
This REGEXP_EXTRACT
formula captures the characters a-zA-Z0-9_.
此REGEXP_EXTRACT
公式捕获字符a-zA-Z0-9_.
between /
and ?
在/
和?
REGEXP_EXTRACT(Event Label, "/([\\w\\.]+)\\?")
Google Data Studio Report to demonstrate. 谷歌数据洞察报告来演示。
Please try the following regex请尝试以下正则表达式[A-Za-z\\_]*.pdf
I have tried it online at https://regexr.com/ .我已经在https://regexr.com/在线尝试过。 Attaching the screenshot for reference附上截图以供参考
Please note that this only works for .pdf files请注意,这只适用于 .pdf 文件
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.