簡體 English 中英

如何在 Web 使用代理在 python 請求中抓取時獲得更少的 403 和更多的 200 響應？

[英]How to get less 403 and more 200 response while Web Scraping in python request using proxy?

原文 2019-10-24 11:31:00 8 1 python/ python-3.x/ web-scraping/ python-multithreading/ http-proxy

我正在研究一個需要抓取一些 URL 的研究項目。 我有超過 5k 個foursquare URL（比如這個：https://foursquare.com/v/t-spesjalleke/4c94ec0d533aa09384d5c345 ），我只需要知道餐廳是否便宜/中等/昂貴/非常昂貴。 所以我寫了一個腳本，我從https://free-proxy-list.net為每個foursquare請求解析50個代理。 我正在使用代理列表中的 random.choice() 進行請求，直到我得到響應代碼 200。在使用漂亮的湯獲得 200 響應代碼后，獲取類別並將其寫入文件。 但問題是我收到了很多 403 代碼。 這就是為什么每個請求都需要這么多時間的原因。 所以我嘗試了 concurrent.futures.ThreadPoolExecutor(max_workers=8) 以使其更快，但它並沒有變得更快，因為我收到了很多 403 響應。 我也在使用 header 代理來請求。

這是我要運行的腳本： https://paste.ubuntu.com/p/j3FYGngMGS/

我需要使過程優化。 我沒有找到任何其他解決方案。 請提供一些可能有幫助的見解。 非常感謝。

1 個解決方案

這是 IP 位置問題嗎？ 由於 IP 與網站所有者設置的國家/地區不匹配，您可能會收到錯誤消息。

另一個可能是 IP 黑名單，雖然你有代理池，但這些不是私有的，因此其他人可以使用它們並將它們列出。

如何根據請求在 python 中獲取代理？

[英]how to get proxy in python on scraping with request?

使用 python 抓取 Indeed web 時遇到 403 錯誤

[英]Facing 403 error while Indeed web scraping using python

為什么 urlopen 給出響應 200 而請求給出 403？

[英]Why urlopen gives response 200 while request give 403?

如何編碼以在 Python 網絡抓取中的每個請求上使用不同的代理？

[英]How to code to use different proxy on every request in Python web scraping?

python web scraping：onclick ajax請求不返回狀態為200的任何內容

[英]python web scraping: onclick ajax request returns nothing with status 200

得到“ <Response [403]> ”在Python中使用request.post

[英]Get “<Response [403]>” using request.post in Python

更復雜的python發布請求（網絡抓取）

[英]more sopisticated python post request (web scraping)

Web抓取/在Python中使用請求

[英]Web scraping/ using request in python

使用python在Web代理上抓取網站

[英]scraping website on web proxy using python

如何使用scrapy解決網頁抓取中的雙重403響應

[英]how to solve double 403 response in web scraping with scrapy

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 如何根據請求在 python 中獲取代理？使用 python 抓取 Indeed web 時遇到 403 錯誤為什么 urlopen 給出響應 200 而請求給出 403？如何編碼以在 Python 網絡抓取中的每個請求上使用不同的代理？ python web scraping：onclick ajax請求不返回狀態為200的任何內容得到“ <Response [403]> ”在Python中使用request.post 更復雜的python發布請求（網絡抓取） Web抓取/在Python中使用請求使用python在Web代理上抓取網站如何使用scrapy解決網頁抓取中的雙重403響應

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM