Python：urllib urlopen 卡住，超時錯誤

Question

正如標題所述， urlopen 被困在 URL 的打開中。

編碼：

from bs4 import BeautifulSoup as soup  # HTML data structure
from urllib.request import urlopen as uReq  # Web client

page_url = "https://store.hp.com/us/en/pdp/hp-laserjet-pro-m404n?jumpid=ma_weekly-deals_product-tile_printers_3_w1a52a_hp-laserjet-pro-m404"

uClient = uReq(page_url)

# parses html into a soup data structure to traverse html
# as if it were a json data type.
page_soup = soup(uClient.read(), "html.parser")

uClient.close()

print(page_soup)

問題：它卡在 uReq 上。 但是，如果您要用以下鏈接替換 page_url，則一切正常。

page_url= "http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&N=-1&IsNodeId=1&Description=GTX&bop=And&Page=1&PageSize=36&order=BESTMATCH"

錯誤：超時錯誤

我怎樣才能實現打開給定的 URL，用於 Web Scraping 的目的？

編輯

Answer 1

一些網站需要User-Agent標頭才能產生成功的請求。 導入urllib.request.Request並修改你的代碼如下

from bs4 import BeautifulSoup as soup  # HTML data structure
from urllib.request import urlopen as uReq, Request  # Web client

page_url = "https://store.hp.com/us/en/pdp/hp-laserjet-pro-m404n?jumpid=ma_weekly-deals_product-tile_printers_3_w1a52a_hp-laserjet-pro-m404"

uClient = uReq(Request(page_url, headers={
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:73.0) Gecko/20100101 Firefox/73.0'
}))

# parses html into a soup data structure to traverse html
# as if it were a json data type.
page_soup = soup(uClient.read(), "html.parser")

uClient.close()

print(page_soup)

你會沒事的

Python：urllib urlopen 卡住，超時錯誤

問題描述

1 個解決方案

解決方案1
0 2020-03-01 17:04:54

Python：urllib urlopen 卡住，超時錯誤

問題描述

1 個解決方案

解決方案1 0 2020-03-01 17:04:54

解決方案1
0 2020-03-01 17:04:54