從 python 中的 url 字符串中剝離 A 特定部分

Question

我正在通過一些 url，我想去掉其中動態變化的一部分，所以我不知道它的第一手資料。 一個例子url是：

https://...?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2

我想gid=lostchapter部分，不帶任何 rest。

我怎么做？

Answer 1

您可以使用urllib將查詢字符串轉換為 Python dict並訪問所需的項目：

In [1]: from urllib import parse

In [2]: s = "https://...?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2"

In [3]: q = parse.parse_qs(parse.urlsplit(s).query)

In [4]: q
Out[4]:
{'pid': ['2'],
 'gid': ['lostchapter'],
 'lang': ['en_GB'],
 'practice': ['1'],
 'channel': ['desktop'],
 'demo': ['2']}

In [5]: q["gid"]
Out[5]: ['lostchapter']

Answer 2

我們可以嘗試做一個正則表達式替換：

url = "https://...?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2"
output = re.sub(r'(?<=[?&])gid=lostchapter&?', '', url)
print(output)  # https://...?pid=2&lang=en_GB&practice=1&channel=desktop&demo=2

對於更通用的替換，請匹配以下正則表達式模式：

(?<=[?&])gid=\w+&?

Answer 3

這是剝離它們的簡單方法

urls = "https://...?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2"

# Import the `urlparse` and `urlunparse` methods
from urllib.parse import urlparse, urlunparse

# Parse the URL
url = urlparse(urls)

# Convert the `urlparse` object back into a URL string
url = urlunparse(url)

# Strip the string
url = url.split("?")[1]
url = url.split("&")[1]
# Print the new URL
print(url) # Prints "gid=lostchapter"

Answer 4

使用字符串切片（我假設gid=lostchapter之后會有一個'&'）

url = r'https://...?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2'
start = url.find('gid')
end = start + url[url.find('gid'):].find('&')
url = url[start:] + url[:end-1]
print(url)

output

gid=lostchapter

我想在這里做的是：

查找“gid”出現的索引
找到“gid”后的第一個“&”
連接 url 在“gid”之后和“&”之前的部分

Answer 5

方法 1：使用 UrlParsers

from urllib.parse import urlparse
p = urlparse('https://.../?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2')
param: list[str] = [i for i in p.query.split('&') if i.startswith('gid=')]

Output: gid=lostchapter

方法 2：使用正則表達式

param: str = re.search(r'gid=.*&', 'https://.../?pid=2&gid=lostchapter&lang=en_GB&practice=1&channel=desktop&demo=2').group()[:-1]

您可以將正則表達式模式更改為適當的模式以匹配預期的輸出。 目前它將提取任何價值。

從 python 中的 url 字符串中剝離 A 特定部分

問題描述

5 個解決方案

解決方案1
2 已采納 2022-12-11 04:46:59

解決方案2
1 2022-12-11 04:36:30

解決方案3
1 2022-12-11 04:38:25

解決方案4
1 2022-12-11 04:51:35

解決方案5
1 2022-12-11 05:01:13

方法 1：使用 UrlParsers

方法 2：使用正則表達式

從 python 中的 url 字符串中剝離 A 特定部分

問題描述

5 個解決方案

解決方案1 2 已采納 2022-12-11 04:46:59

解決方案2 1 2022-12-11 04:36:30

解決方案3 1 2022-12-11 04:38:25

解決方案4 1 2022-12-11 04:51:35

解決方案5 1 2022-12-11 05:01:13

方法 1：使用 UrlParsers

方法 2：使用正則表達式

解決方案1
2 已采納 2022-12-11 04:46:59

解決方案2
1 2022-12-11 04:36:30

解決方案3
1 2022-12-11 04:38:25

解決方案4
1 2022-12-11 04:51:35

解決方案5
1 2022-12-11 05:01:13