[英]access denied with selenium and requests to site
我正在嘗試從網站上抓取產品,我首先嘗試使用請求(帶有標題)但我的列表是空的,如果我打印 si 沒有得到與我的瀏覽器相同的 html 所以我嘗試使用此代碼 selenium :
og_name_list = []
item = 'https://www.bstn.com/eu_nl/catalogsearch/result/?q=jordan&categories=Men~Footwear~Sneakers&raffle=No'
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--no-sandbox')
driver = webdriver.Chrome(executable_path = '/Users/maurijnvd/Downloads/chromedriver 2', options=options)
driver.get(item)
html = driver.page_source
s = BeautifulSoup(html, 'lxml')
names = s.find_all('a', class_='catalog-grid-item__name-link', href=True)
for name in names:
namemp = name['href']
og_name_list.append(namemp)
print(len(og_name_list))
print(s)
但是 output html 包含拒絕訪問消息:
0
<html class="no-js" lang="en-US"><!--<![endif]--><head>
<title>Access denied | www.bstn.com used Cloudflare to restrict access</title>
<meta charset="utf-8"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="IE=Edge,chrome=1" http-equiv="X-UA-Compatible"/>
<meta content="noindex, nofollow" name="robots"/>
<meta content="width=device-width,initial-scale=1" name="viewport"/>
<link href="/cdn-cgi/styles/main.css" id="cf_styles-css" media="screen,projection" rel="stylesheet" type="text/css"/>
<script type="text/javascript">
(function(){if(document.addEventListener&&window.XMLHttpRequest&&JSON&&JSON.stringify){var e=function(a){var c=document.getElementById("error-feedback-survey"),d=document.getElementById("error-feedback-success"),b=new XMLHttpRequest;a={event:"feedback clicked",properties:{errorCode:1020,helpful:a,version:1}};b.open("POST","https://sparrow.cloudflare.com/api/v1/event");b.setRequestHeader("Content-Type","application/json");b.setRequestHeader("Sparrow-Source-Key","c771f0e4b54944bebf4261d44bd79a1e");
b.send(JSON.stringify(a));c.classList.add("feedback-hidden");d.classList.remove("feedback-hidden")};document.addEventListener("DOMContentLoaded",function(){var a=document.getElementById("error-feedback"),c=document.getElementById("feedback-button-yes"),d=document.getElementById("feedback-button-no");"classList"in a&&(a.classList.remove("feedback-hidden"),c.addEventListener("click",function(){e(!0)}),d.addEventListener("click",function(){e(!1)}))})}})();
</script>
<script defer="" src="https://api.radar.cloudflare.com/beacon.js"></script>
</head>
<body>
<div id="cf-wrapper">
<div class="cf-alert cf-alert-error cf-cookie-error hidden" data-translate="enable_cookies" id="cookie-alert">Please enable cookies.</div>
<div class="p-0" id="cf-error-details">
<header class="mx-auto pt-10 lg:pt-6 lg:px-8 w-240 lg:w-full mb-15 antialiased">
<h1 class="inline-block md:block mr-2 md:mb-2 font-light text-60 md:text-3xl text-black-dark leading-tight">
<span data-translate="error">Error</span>
<span>1020</span>
</h1>
<span class="inline-block md:block heading-ray-id font-mono text-15 lg:text-sm lg:leading-relaxed">Ray ID: 6e759ee1dd9c76a1 •</span>
<span class="inline-block md:block heading-ray-id font-mono text-15 lg:text-sm lg:leading-relaxed">2022-03-05 20:32:23 UTC</span>
<h2 class="text-gray-600 leading-1.3 text-3xl lg:text-2xl font-light">Access denied</h2>
</header>
<section class="w-240 lg:w-full mx-auto mb-8 lg:px-8">
<div class="w-1/2 md:w-full" id="what-happened-section">
<h2 class="text-3xl leading-tight font-normal mb-4 text-black-dark antialiased" data-translate="what_happened">What happened?</h2>
<p>This website is using a security service to protect itself from online attacks.</p>
</div>
</section>
<div class="py-8 text-center" id="error-feedback">
<div id="error-feedback-survey">
Was this page helpful?
<button class="border border-solid bg-white cf-button cursor-pointer ml-4 px-4 py-2 rounded" id="feedback-button-yes" type="button">Yes</button>
<button class="border border-solid bg-white cf-button cursor-pointer ml-4 px-4 py-2 rounded" id="feedback-button-no" type="button">No</button>
</div>
<div class="feedback-success feedback-hidden" id="error-feedback-success">
Thank you for your feedback!
</div>
</div>
<div class="cf-error-footer cf-wrapper w-240 lg:w-full py-10 sm:py-4 sm:px-8 mx-auto text-center sm:text-left border-solid border-0 border-t border-gray-300">
<p class="text-13">
<span class="cf-footer-item sm:block sm:mb-1">Cloudflare Ray ID: <strong class="font-semibold">6e759ee1dd9c76a1</strong></span>
<span class="cf-footer-separator sm:hidden">•</span>
<span class="cf-footer-item sm:block sm:mb-1"><span>Your IP</span>: 2a02:a455:c03b:1:dc83:20cd:2a6b:cbaa</span>
<span class="cf-footer-separator sm:hidden">•</span>
<span class="cf-footer-item sm:block sm:mb-1"><span>Performance & security by</span> <a href="https://www.cloudflare.com/5xx-error-landing" id="brand_link" rel="noopener noreferrer" target="_blank">Cloudflare</a></span>
</p>
</div><!-- /.error-footer -->
</div><!-- /#cf-error-details -->
</div><!-- /#cf-wrapper -->
<script type="text/javascript">
window._cf_translation = {};
</script>
</body></html>
我想進入該站點,但我似乎無法進入,任何想法或幫助表示贊賞,我添加了錯誤的屏幕截圖(由 selenium 制作)。 我可以在我的普通瀏覽器中訪問該站點,我想最好通過請求來抓取它,但 selenium 也可以。 謝謝
我不是這方面的專家,但根據您提供的信息(尤其是屏幕截圖),您正在聯系的網站似乎受到保護,因此我猜您無法通過這種方式訪問它命令。 這意味着無法使用 Selenium 請求該站點的產品。
但是,正如我所說,我不是專家,所以如果我錯了請糾正我。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.