如何在 Python 中將 URL 字符串拆分成單獨的部分？

Question

我決定今晚學習 Python :) 我對 C 很了解（用它寫了一個操作系統），所以我不是編程菜鳥，所以 Python 中的一切看起來都很簡單，但我不知道如何解決這個問題：假設我有這個地址：

http://example.com/random/folder/path.html

現在我如何從中創建兩個字符串，一個包含服務器的“基本”名稱，所以在這個例子中它將是

http://example.com/

另一個包含沒有最后一個文件名的東西，所以在這個例子中它將是

http://example.com/random/folder/

另外我當然知道可以分別找到第三個和最后一個斜杠，但是有更好的方法嗎？

在這兩種情況下都有尾部斜線也很酷，但我不在乎，因為它可以很容易地添加。 那么有沒有好的、快速的、有效的解決方案呢？ 還是只有“我的”解決方案，找到斜線？

Answer 1

Python 2.x 中的urlparse模塊（或 Python 3.x 中的 urllib.parse）將是執行此操作的方法。

>>> from urllib.parse import urlparse
>>> url = 'http://example.com/random/folder/path.html'
>>> parse_object = urlparse(url)
>>> parse_object.netloc
'example.com'
>>> parse_object.path
'/random/folder/path.html'
>>> parse_object.scheme
'http'
>>>

如果你想對 URL 下文件的路徑做更多的工作，你可以使用posixpath模塊：

>>> from posixpath import basename, dirname
>>> basename(parse_object.path)
'path.html'
>>> dirname(parse_object.path)
'/random/folder'

之后，您可以使用posixpath.join將各個部分粘合在一起。

注意：Windows 用戶會因為os.path中的路徑分隔符而感到窒息。 posixpath模塊文檔有一個關於 URL 操作的特殊參考，所以一切都很好。

Answer 2

如果這是您的 URL 解析的范圍，Python 的內置rpartition將完成這項工作：

>>> URL = "http://example.com/random/folder/path.html"
>>> Segments = URL.rpartition('/')
>>> Segments[0]
'http://example.com/random/folder'
>>> Segments[2]
'path.html'

來自Pydoc ，str.rpartition：

Splits the string at the last occurrence of sep, and returns a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing two empty strings, followed by the string itself

這意味着 rpartition 會為您進行搜索，並在您指定的字符（在本例中為 / ）的最后一次（最右邊）出現處拆分字符串。 它返回一個包含以下內容的元組：

(everything to the left of char , the character itself , everything to the right of char)

Answer 3

我沒有使用 Python 的經驗，但我找到了urlparse 模塊，它應該可以完成這項工作。

Answer 4

在 Python 中，很多操作都是使用列表完成的。 Sebasian Dietz 提到的urlparse模塊可能會很好地解決您的具體問題，但如果您通常對在字符串中查找斜杠的 Pythonic 方法感興趣，例如，請嘗試如下操作：

url = 'http://example.com/random/folder/path.html'

# Create a list of each bit between slashes
slashparts = url.split('/')

# Now join back the first three sections 'http:', '' and 'example.com'
basename = '/'.join(slashparts[:3]) + '/'

# All except the last one
dirname = '/'.join(slashparts[:-1]) + '/'

print 'slashparts = %s' % slashparts
print 'basename = %s' % basename
print 'dirname = %s' % dirname

這個程序的輸出是這樣的：

slashparts = ['http:', '', 'example.com', 'random', 'folder', 'path.html']
basename = http://example.com/
dirname = http://example.com/random/folder/

有趣的位是split 、 join 、切片符號 array[A:B] （包括從末尾偏移的負數），以及作為獎勵的字符串上的%運算符，以提供printf樣式的格式。

Answer 5

sykora 的回答中提到的posixpath模塊似乎在我的 Python 設置（Python 2.7.3）中不可用。

根據這篇文章，執行此操作的“正確”方法似乎是使用...

urlparse.urlparse和urlparse.urlunparse可用於分離和重新附加 URL 的基礎
os.path的函數可以用來操作路徑
urllib.url2pathname和urllib.pathname2url （使路徑名操作可移植，因此它可以在 Windows 等上工作）

因此，例如（不包括重新附加基本 URL）...

>>> import urlparse, urllib, os.path
>>> os.path.dirname(urllib.url2pathname(urlparse.urlparse("http://example.com/random/folder/path.html").path))
'/random/folder'

Answer 6

您可以使用 Python 的庫furl ：

f = furl.furl("http://example.com/random/folder/path.html")
print(str(f.path))  # '/random/folder/path.html'
print(str(f.path).split("/")) # ['', 'random', 'folder', 'path.html']

要在第一個“/”之后訪問單詞，請使用：

str(f.path).split("/") # 'random'

如何在 Python 中將 URL 字符串拆分成單獨的部分？

問題描述

6 個解決方案

解決方案1
56 2009-01-16 08:14:36

解決方案2
12 2009-01-16 08:11:11

解決方案3
10 2009-01-16 07:49:55

解決方案4
8 2009-01-16 08:08:32

解決方案5
2 2013-02-06 05:35:32

解決方案6
1 2016-12-02 15:58:06

如何在 Python 中將 URL 字符串拆分成單獨的部分？

問題描述

6 個解決方案

解決方案1 56 2009-01-16 08:14:36

解決方案2 12 2009-01-16 08:11:11

解決方案3 10 2009-01-16 07:49:55

解決方案4 8 2009-01-16 08:08:32

解決方案5 2 2013-02-06 05:35:32

解決方案6 1 2016-12-02 15:58:06

解決方案1
56 2009-01-16 08:14:36

解決方案2
12 2009-01-16 08:11:11

解決方案3
10 2009-01-16 07:49:55

解決方案4
8 2009-01-16 08:08:32

解決方案5
2 2013-02-06 05:35:32

解決方案6
1 2016-12-02 15:58:06