简体   繁体   English

python中的正则表达式获取URL或路径中文件扩展名的最后一次出现

[英]Regular expression in python to get the last occurence of a file extension in a URL or path

Given a long url or path how do I get the last file extension in it.给定一个长 url 或路径,我如何获取其中的最后一个文件扩展名。 For example consider these two strings.例如考虑这两个字符串。

url = 'https://image.freepik.com/free-vector/vector-chickens-full-emotions_75487-787.jpg?x=2'
path = './image.freepik.com/free-vector/vector-chickens-full-emotions_75487-787.abc.jpg'

The last extension is jpg and comes after the last .最后一个扩展名是jpg并在最后一个. and before the following non-alphanumerics or end-of-string.在以下非字母数字或字符串结尾之前。

There are similar questions to mine but I can't find an exact match.有与我类似的问题,但我找不到完全匹配的问题。

re.search('\.(\w+)(?!.*\.)', url).group(1)

使用负前瞻搜索后面没有点的匹配项

Parsing rules are different for FILENAMES, and URLS - so don't make a single REGEX to do that, its not simple and not worth your time. FILENAMES 和 URLS 的解析规则是不同的 - 所以不要用一个 REGEX 来做到这一点,它不简单,不值得你花时间。

Instead, make a test of some sort - to determine what type of object you are looking at, ie: This IS or ISNOT a URL.取而代之的是,使某种形式的测试-以确定你正在寻找什么类型的对象,即:这并非该URL。 This could be as simple as: Does it start with http://, then it is a URL.. if not ... it is not a URL这可能很简单:它是否以 http:// 开头,那么它是一个 URL.. 如果不是......它不是一个 URL

Then apply the specific rule to the specific type.然后将特定规则应用于特定类型。

Always make use of standard tools, they have often already figured out the corner cases or things you will forget.始终使用标准工具,他们通常已经弄清楚了您会忘记的极端情况或事情。

The URL parser: https://docs.python.org/3/library/urllib.parse.html URL 解析器: https : //docs.python.org/3/library/urllib.parse.html

Then, for files use: os.path.splitext(path) in the standard python library: https://docs.python.org/3/library/os.path.html然后,对于文件使用:标准 python 库中的 os.path.splitext(path): https : //docs.python.org/3/library/os.path.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM