使用正则表达式提取文件名

Question

I want to create a regular expressions to extract the filename of an url 我想创建一个正则表达式来提取网址的文件名

https://example.net/img/src/img.jpg

I want to extract img1.jpg 我想提取img1.jpg

I use urlparse from python but it extract the path in this way 我从python使用urlparse，但是它以这种方式提取路径

img/src/img.jpg

How I can extract the file name with a regular expression 如何使用正则表达式提取文件名

Answer 1

Using str.split and negative indexing 使用str.split和负索引

url = "https://example.net/img/src/img.jpg"
print(url.split("/")[-1])

Output: 输出：

img.jpg

or using os.path.basename 或使用os.path.basename

import urlparse, os
url = "https://example.net/img/src/img.jpg"
a = urlparse.urlparse(url)
print(os.path.basename(a.path))   #--->img.jpg

Answer 2

If your url pattern is static you can use positive lookahead , 如果您的网址格式是静态的，则可以使用正向前瞻，

import re
pattern =r'\w+(?=\.jpg)'

text="""https://example.net/img/src/img.jpg
"""


print(re.findall(pattern,text)[0])

output: 输出：

img

Answer 3

You can either use a split on / and select the last element of the returned array (the best solution in my opinion) 您可以在/上使用split ，然后选择返回数组的最后一个元素（我认为这是最佳解决方案）

or if you really want to use a regex you can use the following one 或者如果您真的想使用正则表达式，则可以使用以下代码

(?<=\/)(?:(?:\w+\.)*\w+)$

Note that only the following filenames are accepted: DEMO 请注意，仅接受以下文件名： DEMO

You can adapt and change the \\w to accept other characters if necessary. 您可以根据需要修改和更改\\w以接受其他字符。

Explanations: 说明：

(?<=\\/) positive lookbehind on / and $ add the constraint that the filename string is the last element of the path /和$后面的(?<=\\/)正向添加了约束，即文件名字符串是路径的最后一个元素
(?:(?:\\w+\\.)*\\w+) is used to extract words that are composed of several letters/digits and eventually underscores followed by a dot, this group can be repeated as many time as necessary ( xxx.tar.gz file for example) and then followed by the final extension. (?:(?:\\w+\\.)*\\w+)用于提取由几个字母/数字组成的单词，并最终在下划线后跟一个点，该组可以根据需要重复多次（ xxx.tar.gz文件），然后是最终扩展名。

使用正则表达式提取文件名

问题描述

3 个解决方案

解决方案1
3 2018-05-09 04:16:54

解决方案2
0 2018-05-09 04:33:44

解决方案3
0 已采纳 2018-05-09 05:22:33

使用正则表达式提取文件名

问题描述

3 个解决方案

解决方案1 3 2018-05-09 04:16:54

解决方案2 0 2018-05-09 04:33:44

解决方案3 0 已采纳 2018-05-09 05:22:33

解决方案1
3 2018-05-09 04:16:54

解决方案2
0 2018-05-09 04:33:44

解决方案3
0 已采纳 2018-05-09 05:22:33