[英]Extract file name with a regular expression
I want to create a regular expressions to extract the filename of an url 我想创建一个正则表达式来提取网址的文件名
https://example.net/img/src/img.jpg
I want to extract img1.jpg
我想提取
img1.jpg
I use urlparse from python but it extract the path in this way 我从python使用urlparse,但是它以这种方式提取路径
img/src/img.jpg
How I can extract the file name with a regular expression 如何使用正则表达式提取文件名
Using str.split
and negative indexing 使用
str.split
和负索引
url = "https://example.net/img/src/img.jpg"
print(url.split("/")[-1])
Output: 输出:
img.jpg
or using os.path.basename
或使用
os.path.basename
import urlparse, os
url = "https://example.net/img/src/img.jpg"
a = urlparse.urlparse(url)
print(os.path.basename(a.path)) #--->img.jpg
If your url pattern is static you can use positive lookahead , 如果您的网址格式是静态的,则可以使用正向前瞻,
import re
pattern =r'\w+(?=\.jpg)'
text="""https://example.net/img/src/img.jpg
"""
print(re.findall(pattern,text)[0])
output: 输出:
img
You can either use a split
on /
and select the last element of the returned array (the best solution in my opinion) 您可以在
/
上使用split
,然后选择返回数组的最后一个元素(我认为这是最佳解决方案)
or if you really want to use a regex you can use the following one 或者如果您真的想使用正则表达式,则可以使用以下代码
(?<=\/)(?:(?:\w+\.)*\w+)$
Note that only the following filenames are accepted: DEMO 请注意,仅接受以下文件名: DEMO
You can adapt and change the \\w
to accept other characters if necessary. 您可以根据需要修改和更改
\\w
以接受其他字符。
Explanations: 说明:
(?<=\\/)
positive lookbehind on /
and $
add the constraint that the filename string is the last element of the path /
和$
后面的(?<=\\/)
正向添加了约束,即文件名字符串是路径的最后一个元素 (?:(?:\\w+\\.)*\\w+)
is used to extract words that are composed of several letters/digits and eventually underscores followed by a dot, this group can be repeated as many time as necessary ( xxx.tar.gz
file for example) and then followed by the final extension. (?:(?:\\w+\\.)*\\w+)
用于提取由几个字母/数字组成的单词,并最终在下划线后跟一个点,该组可以根据需要重复多次( xxx.tar.gz
文件),然后是最终扩展名。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.