简体   繁体   English

使用正则表达式提取文件名

[英]Extract file name with a regular expression

I want to create a regular expressions to extract the filename of an url 我想创建一个正则表达式来提取网址的文件名

https://example.net/img/src/img.jpg

I want to extract img1.jpg 我想提取img1.jpg

I use urlparse from python but it extract the path in this way 我从python使用urlparse,但是它以这种方式提取路径

img/src/img.jpg

How I can extract the file name with a regular expression 如何使用正则表达式提取文件名

Using str.split and negative indexing 使用str.split和负索引

url = "https://example.net/img/src/img.jpg"
print(url.split("/")[-1])

Output: 输出:

img.jpg

or using os.path.basename 或使用os.path.basename

import urlparse, os
url = "https://example.net/img/src/img.jpg"
a = urlparse.urlparse(url)
print(os.path.basename(a.path))   #--->img.jpg

If your url pattern is static you can use positive lookahead , 如果您的网址格式是静态的,则可以使用正向前瞻,

import re
pattern =r'\w+(?=\.jpg)'

text="""https://example.net/img/src/img.jpg
"""


print(re.findall(pattern,text)[0])

output: 输出:

img

You can either use a split on / and select the last element of the returned array (the best solution in my opinion) 您可以在/上使用split ,然后选择返回数组的最后一个元素(我认为这是最佳解决方案)

or if you really want to use a regex you can use the following one 或者如果您真的想使用正则表达式,则可以使用以下代码

(?<=\/)(?:(?:\w+\.)*\w+)$

Note that only the following filenames are accepted: DEMO 请注意,仅接受以下文件名: DEMO

You can adapt and change the \\w to accept other characters if necessary. 您可以根据需要修改和更改\\w以接受其他字符。

Explanations: 说明:

  • (?<=\\/) positive lookbehind on / and $ add the constraint that the filename string is the last element of the path /$后面的(?<=\\/)正向添加了约束,即文件名字符串是路径的最后一个元素
  • (?:(?:\\w+\\.)*\\w+) is used to extract words that are composed of several letters/digits and eventually underscores followed by a dot, this group can be repeated as many time as necessary ( xxx.tar.gz file for example) and then followed by the final extension. (?:(?:\\w+\\.)*\\w+)用于提取由几个字母/数字组成的单词,并最终在下划线后跟一个点,该组可以根据需要重复多次( xxx.tar.gz文件),然后是最终扩展名。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM