Python urllib.request - 如何設置標頭

Question

我需要為urllib.request設置標題以獲取真實頁面並避免重定向。 如果我只使用這段代碼：

import urllib.request
urllib.request.urlretrieve("https://open.spotify.com/artist/4npEfmQ6YuiwW1GpUmaq3F", "test.html")

Spotify 識別出我使用了不受支持的瀏覽器，並將我重定向到不同的頁面。 我需要得到原始的 HTML，我認為設置標題可以提供幫助。

Answer 1

看起來如果一個request添加了一個user-agent ，Spotify 會執行額外的檢查。 這可以通過添加所有標題來解決。 或者，您可以將用戶代理設置為 Spotify 不知道的瀏覽器，例如TEST 。

我有以下抓取代碼可以在沒有任何標題的情況下工作，所以除非有特定原因，否則我不會承認標題。 （由於標題中的問題，我只添加了它）。

import requests
from bs4 import BeautifulSoup

urls = [
 'https://open.spotify.com/artist/711MCceyCBcFnzjGY4Q7Un',
 'https://open.spotify.com/artist/4npEfmQ6YuiwW1GpUmaq3F'
]
headers = {
    'user-agent': 'TEST'
}

for url in urls:
    response = requests.get(url, headers=headers)
    html = response.text
    soup = BeautifulSoup(html, 'html.parser')
    #print(soup.prettify())
    print(soup.find('h1').text.strip())

Output：

AC/DC
Ava Max

Python urllib.request - 如何設置標頭

問題描述

1 個解決方案

解決方案1
1 2020-12-21 14:52:33

Python urllib.request - 如何設置標頭

問題描述

1 個解決方案

解決方案1 1 2020-12-21 14:52:33

解決方案1
1 2020-12-21 14:52:33