简体   繁体   中英

Regular expression in python to get the last occurence of a file extension in a URL or path

Given a long url or path how do I get the last file extension in it. For example consider these two strings.

url = 'https://image.freepik.com/free-vector/vector-chickens-full-emotions_75487-787.jpg?x=2'
path = './image.freepik.com/free-vector/vector-chickens-full-emotions_75487-787.abc.jpg'

The last extension is jpg and comes after the last . and before the following non-alphanumerics or end-of-string.

There are similar questions to mine but I can't find an exact match.

re.search('\.(\w+)(?!.*\.)', url).group(1)

使用负前瞻搜索后面没有点的匹配项

Parsing rules are different for FILENAMES, and URLS - so don't make a single REGEX to do that, its not simple and not worth your time.

Instead, make a test of some sort - to determine what type of object you are looking at, ie: This IS or ISNOT a URL. This could be as simple as: Does it start with http://, then it is a URL.. if not ... it is not a URL

Then apply the specific rule to the specific type.

Always make use of standard tools, they have often already figured out the corner cases or things you will forget.

The URL parser: https://docs.python.org/3/library/urllib.parse.html

Then, for files use: os.path.splitext(path) in the standard python library: https://docs.python.org/3/library/os.path.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM