[英]Deleting all occurances of '/' after its 2nd occurance in python
I have a URL string which is https://example.com/about/hello/
我有一个URL字符串,它是https://example.com/about/hello/
I want to split string as 'https://example.com', 'about' ,'hello'
我想将字符串拆分为'https://example.com', 'about' ,'hello'
How to do this ?? 这个怎么做 ??
Use the urlparse
to correctly parse a URL: 使用urlparse
正确解析URL:
import urlparse
url = 'https://example.com/about/hello/'
parts = urlparse.urlparse(url)
paths = [p for p in parts.path.split('/') if p]
print 'Scheme:', parts.scheme # https
print 'Host:', parts.netloc # example.com
print 'Path:', parts.path # /about/hello/
print 'Paths:', paths # ['about', 'hello']
At the end of the day, the information you want are in the parts.scheme
, parts.netloc
and paths
variables. 最终,您需要的信息在parts.scheme
, parts.netloc
和paths
变量中。
You may do this : 您可以这样做:
Code: 码:
text="https://example.com/about/hello/"
groups = text.split('/')
print( "/".join(groups[:3]),groups[3],groups[4])
Output: 输出:
https://example.com about hello
There are lots of ways to do this. 有很多方法可以做到这一点。 You could use re.split()
to split on a regular expression, for instance. 例如,您可以使用re.split()
对正则表达式进行拆分。
>>> import re
>>> re.split(r'\b/\b', 'https://example.com/about/hello/')
['https://example.com', 'about', 'hello']
re
is part of the standard library, documented here. re
是标准库的一部分,在此处记录。 https://docs.python.org/3/library/re.html#re.split The regex itself uses \\b
which means a boundy between a "word" character and a "non-word" character. https://docs.python.org/3/library/re.html#re.split regex本身使用\\b
,这表示“单词”字符和“非单词”字符之间的界限。 You can use regex101 to explore how it works. 您可以使用regex101探索其工作方式。 https://regex101.com/r/mY8fV8/1 https://regex101.com/r/mY8fV8/1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.