如何从URL中提取文件名？

Question

I have file names in a URL and want to strip out the preceding URL and filepath as well as the version that appears after the ?我在 URL 中有文件名，想去掉前面的 URL 和文件路径以及出现在?

Sample URL 示例网址

Trying to use RegEx to pull, CaptialForecasting_Datasheet.pdf尝试使用 RegEx 拉取， CaptialForecasting_Datasheet.pdf

The REGEXP_EXTRACT in Google Data Studio seems unique. Google Data Studio 中的REGEXP_EXTRACT似乎是独一无二的。 Tried the suggestion but kept getting " could not parse " error.尝试了该建议，但不断收到“无法解析”错误。 I was able to strip out the first part of the url with the following.我能够使用以下内容删除 url 的第一部分。 Event Label is where I store URL of downloaded PDF. Event Label是我存储下载的 PDF 的 URL 的地方。

The URL:网址：

https://www.dudesolutions.com/Portals/0/Documents/HC_Brochure_Digital.pdf?ver=2018-03-18-110927-033

REGEXP_EXTRACT( Event Label , 'Documents/([^&]+)' )

The result:结果：

HC_Brochure_Digital.pdf?ver=2018-03-18-110927-033

Now trying to determine how do I pull out everything after the ?现在试图确定如何在? where the version data is, so as to extract just the Filename.pdf .版本数据在哪里，以便只提取Filename.pdf 。

Answer 1

You could try:你可以试试：

[^\\/]+(?=\\?[^\\/]*$)

This will match CaptialForecasting_Datasheet.pdf even if there is a question mark in the path.即使路径中有问号，这也将匹配CaptialForecasting_Datasheet.pdf 。 For example, the regex will succeed in both of these cases:例如，正则表达式在这两种情况下都会成功：

https://www.dudesolutions.com/somepath/CaptialForecasting_Datasheet.pdf?ver
https://www.dudesolutions.com/somepath?/CaptialForecasting_Datasheet.pdf?ver

Answer 2

Assuming that the name appears right after the last / and ends with the ?假设名称紧跟在最后一个/并以? , the regular expression below will leave the name in group 1 where you can get it with \\1 or whatever the tool that you are using supports. ，下面的正则表达式会将名称留在组 1 中，您可以在其中使用\\1或您使用的任何工具支持来获取它。

.*\/(.*)\?

It basically says: get everything in between the last / and the first ?它基本上是说：在最后一个/和第一个之间获取所有内容? after, and put it in group 1.之后，将其放入第 1 组。

Another regular expression that only matches the file name that you want but is more complex is:另一个只匹配您想要的文件名但更复杂的正则表达式是：

(?<=\/)[^\/]*(?=\?)

It matches all non- / characters, [^\\/] , immediately preceded by / , (?<=\\/) and immediately followed by ?它匹配所有非/字符， [^\\/] ，紧跟在/ ， (?<=\\/)之后紧跟? , (?=\\?) . , (?=\\?) 。 The first parentheses is a positive lookbehind, and the second expression in parentheses is a positive lookahead.第一个括号是正向后视，括号中的第二个表达式是正向前瞻。

Answer 3

This REGEXP_EXTRACT formula captures the characters a-zA-Z0-9_.此REGEXP_EXTRACT公式捕获字符a-zA-Z0-9_. between / and ?在/和?

REGEXP_EXTRACT(Event Label, "/([\\w\\.]+)\\?")

Google Data Studio Report to demonstrate. 谷歌数据洞察报告来演示。

Answer 4

Please try the following regex请尝试以下正则表达式
[A-Za-z\\_]*.pdf

I have tried it online at https://regexr.com/ .我已经在https://regexr.com/在线尝试过。 Attaching the screenshot for reference附上截图以供参考

Please note that this only works for .pdf files请注意，这只适用于 .pdf 文件

Answer 5

Following regex will extract file name with .pdf extension以下正则表达式将提取扩展名为.pdf文件名

(?:[^\/][\d\w\.]+)(?<=(?:.pdf))

You can add more extensions like this,您可以添加更多这样的扩展，

(?:[^\/][\d\w\.]+)(?<=(?:.pdf)|(?:.jpg))

Demo演示

如何从URL中提取文件名？

问题描述

5 个解决方案

解决方案1
1 2018-05-04 04:03:57

解决方案2
0 2018-05-04 03:50:52

解决方案3
0 2020-02-25 06:39:18

解决方案4
0 2020-02-25 11:56:35

解决方案5
-1 2018-05-04 00:16:30

如何从URL中提取文件名？

问题描述

5 个解决方案

解决方案1 1 2018-05-04 04:03:57

解决方案2 0 2018-05-04 03:50:52

解决方案3 0 2020-02-25 06:39:18

解决方案4 0 2020-02-25 11:56:35

解决方案5 -1 2018-05-04 00:16:30

解决方案1
1 2018-05-04 04:03:57

解决方案2
0 2018-05-04 03:50:52

解决方案3
0 2020-02-25 06:39:18

解决方案4
0 2020-02-25 11:56:35

解决方案5
-1 2018-05-04 00:16:30