在最后一次前向斜杠之前删除部分字符串

Question

The program I am currently working on retrieves URLs from a website and puts them into a list. 我目前正在处理的程序从网站检索URL并将它们放入列表中。 What I want to get is the last section of the URL. 我想得到的是URL的最后一部分。

So, if the first element in my list of URLs is "https://docs.python.org/3.4/tutorial/interpreter.html" I would want to remove everything before "interpreter.html" . 所以，如果我的URL列表中的第一个元素是"https://docs.python.org/3.4/tutorial/interpreter.html"我想删除"interpreter.html"之前的所有内容。

Is there a function, library, or regex I could use to make this happen? 我可以使用函数，库或正则表达式来实现吗？ I've looked at other Stack Overflow posts but the solutions don't seem to work. 我查看了其他Stack Overflow帖子，但解决方案似乎不起作用。

These are two of my several attempts: 这是我的几次尝试中的两个：

for link in link_list:
   file_names.append(link.replace('/[^/]*$',''))
print(file_names)

& ＆

for link in link_list:
   file_names.append(link.rpartition('//')[-1])
print(file_names)

Answer 1

Have a look at str.rsplit . 看看str.rsplit 。

>>> s = 'https://docs.python.org/3.4/tutorial/interpreter.html'
>>> s.rsplit('/',1)
['https://docs.python.org/3.4/tutorial', 'interpreter.html']
>>> s.rsplit('/',1)[1]
'interpreter.html'

And to use RegEx 并使用RegEx

>>> re.search(r'(.*)/(.*)',s).group(2)
'interpreter.html'

Then match the 2nd group which lies between the last / and the end of String. 再搭配其位于最后间的第二组/字符串和结束。 This is a greedy usage of the greedy technique in RegEx. 这是RegEx中贪婪技术的贪婪用法。

正则表达式可视化

Debuggex Demo Debuggex演示

Small Note - The problem with link.rpartition('//')[-1] in your code is that you are trying to match // and not / . 小注 - link.rpartition('//')[-1]在于你试图匹配//而不是/ 。 So remove the extra / as in link.rpartition('/')[-1] . 因此删除link.rpartition('/')[-1]中的extra / as。

Answer 2

That doesn't need regex. 这不需要正则表达式。

import os

for link in link_list:
    file_names.append(os.path.basename(link))

Answer 3

You can use rpartition() : 你可以使用rpartition（）：

>>> s = 'https://docs.python.org/3.4/tutorial/interpreter.html'
>>> s.rpartition('/')
('https://docs.python.org/3.4/tutorial', '/', 'interpreter.html')

And take the last part of the 3 element tuple that is returned: 并采取返回的3元素元组的最后一部分：

>>> s.rpartition('/')[2]
'interpreter.html'

Answer 4

Just use string.split: 只需使用string.split：

url = "/some/url/with/a/file.html"

print url.split("/")[-1]

# Result should be "file.html"

split gives you an array of strings that were separated by "/". split为您提供了一个由“/”分隔的字符串数组。 The [-1] gives you the last element in the array, which is what you want. [-1]为您提供数组中的最后一个元素，这就是您想要的。

Answer 5

This should work if you plan to use regex 如果您打算使用正则表达式，这应该有效

 for link in link_list:
    file_names.append(link.replace('.*/',''))
 print(file_names)

Answer 6

Here's a more general, regex way of doing this: 这是一个更通用，正则表达式的方法：

    re.sub(r'^.+/([^/]+)$', r'\1', "http://test.org/3/files/interpreter.html")
    'interpreter.html'

在最后一次前向斜杠之前删除部分字符串

问题描述

6 个解决方案

解决方案1
20 已采纳 2015-04-15 17:57:30

解决方案2
7 2015-04-15 17:58:01

解决方案3
3 2015-04-15 18:02:32

解决方案4
1 2015-04-15 18:00:39

解决方案5
0 2015-04-15 18:02:19

解决方案6
0 2018-04-12 14:38:21

在最后一次前向斜杠之前删除部分字符串

问题描述

6 个解决方案

解决方案1 20 已采纳 2015-04-15 17:57:30

解决方案2 7 2015-04-15 17:58:01

解决方案3 3 2015-04-15 18:02:32

解决方案4 1 2015-04-15 18:00:39

解决方案5 0 2015-04-15 18:02:19

解决方案6 0 2018-04-12 14:38:21

解决方案1
20 已采纳 2015-04-15 17:57:30

解决方案2
7 2015-04-15 17:58:01

解决方案3
3 2015-04-15 18:02:32

解决方案4
1 2015-04-15 18:00:39

解决方案5
0 2015-04-15 18:02:19

解决方案6
0 2018-04-12 14:38:21