簡體   English   中英

selenium 拒絕訪問並請求訪問站點

[英]access denied with selenium and requests to site

我正在嘗試從網站上抓取產品,我首先嘗試使用請求(帶有標題)但我的列表是空的,如果我打印 si 沒有得到與我的瀏覽器相同的 html 所以我嘗試使用此代碼 selenium :

        og_name_list = []
        item = 'https://www.bstn.com/eu_nl/catalogsearch/result/?q=jordan&categories=Men~Footwear~Sneakers&raffle=No'
        options = webdriver.ChromeOptions()
        options.add_argument('--headless')
        options.add_argument('--disable-dev-shm-usage')
        options.add_argument('--no-sandbox')
        driver = webdriver.Chrome(executable_path = '/Users/maurijnvd/Downloads/chromedriver 2', options=options)
        driver.get(item)
        html = driver.page_source
        s = BeautifulSoup(html, 'lxml')
        names = s.find_all('a', class_='catalog-grid-item__name-link', href=True)
        for name in names:
            namemp = name['href']
            og_name_list.append(namemp)
        print(len(og_name_list))
        print(s)

但是 output html 包含拒絕訪問消息:

            0
            <html class="no-js" lang="en-US"><!--<![endif]--><head>
            <title>Access denied | www.bstn.com used Cloudflare to restrict access</title>
            <meta charset="utf-8"/>
            <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
            <meta content="IE=Edge,chrome=1" http-equiv="X-UA-Compatible"/>
            <meta content="noindex, nofollow" name="robots"/>
            <meta content="width=device-width,initial-scale=1" name="viewport"/>
            <link href="/cdn-cgi/styles/main.css" id="cf_styles-css" media="screen,projection" rel="stylesheet" type="text/css"/>
            <script type="text/javascript">
            (function(){if(document.addEventListener&&window.XMLHttpRequest&&JSON&&JSON.stringify){var e=function(a){var c=document.getElementById("error-feedback-survey"),d=document.getElementById("error-feedback-success"),b=new XMLHttpRequest;a={event:"feedback clicked",properties:{errorCode:1020,helpful:a,version:1}};b.open("POST","https://sparrow.cloudflare.com/api/v1/event");b.setRequestHeader("Content-Type","application/json");b.setRequestHeader("Sparrow-Source-Key","c771f0e4b54944bebf4261d44bd79a1e");
            b.send(JSON.stringify(a));c.classList.add("feedback-hidden");d.classList.remove("feedback-hidden")};document.addEventListener("DOMContentLoaded",function(){var a=document.getElementById("error-feedback"),c=document.getElementById("feedback-button-yes"),d=document.getElementById("feedback-button-no");"classList"in a&&(a.classList.remove("feedback-hidden"),c.addEventListener("click",function(){e(!0)}),d.addEventListener("click",function(){e(!1)}))})}})();
            </script>
            <script defer="" src="https://api.radar.cloudflare.com/beacon.js"></script>
            </head>
            <body>
            <div id="cf-wrapper">
            <div class="cf-alert cf-alert-error cf-cookie-error hidden" data-translate="enable_cookies" id="cookie-alert">Please enable cookies.</div>
            <div class="p-0" id="cf-error-details">
            <header class="mx-auto pt-10 lg:pt-6 lg:px-8 w-240 lg:w-full mb-15 antialiased">
            <h1 class="inline-block md:block mr-2 md:mb-2 font-light text-60 md:text-3xl text-black-dark leading-tight">
            <span data-translate="error">Error</span>
            <span>1020</span>
            </h1>
            <span class="inline-block md:block heading-ray-id font-mono text-15 lg:text-sm lg:leading-relaxed">Ray ID: 6e759ee1dd9c76a1 •</span>
            <span class="inline-block md:block heading-ray-id font-mono text-15 lg:text-sm lg:leading-relaxed">2022-03-05 20:32:23 UTC</span>
            <h2 class="text-gray-600 leading-1.3 text-3xl lg:text-2xl font-light">Access denied</h2>
            </header>
            <section class="w-240 lg:w-full mx-auto mb-8 lg:px-8">
            <div class="w-1/2 md:w-full" id="what-happened-section">
            <h2 class="text-3xl leading-tight font-normal mb-4 text-black-dark antialiased" data-translate="what_happened">What happened?</h2>
            <p>This website is using a security service to protect itself from online attacks.</p>
            </div>
            </section>
            <div class="py-8 text-center" id="error-feedback">
            <div id="error-feedback-survey">
                        Was this page helpful?
                        <button class="border border-solid bg-white cf-button cursor-pointer ml-4 px-4 py-2 rounded" id="feedback-button-yes" type="button">Yes</button>
            <button class="border border-solid bg-white cf-button cursor-pointer ml-4 px-4 py-2 rounded" id="feedback-button-no" type="button">No</button>
            </div>
            <div class="feedback-success feedback-hidden" id="error-feedback-success">
                        Thank you for your feedback!
                     </div>
            </div>
            <div class="cf-error-footer cf-wrapper w-240 lg:w-full py-10 sm:py-4 sm:px-8 mx-auto text-center sm:text-left border-solid border-0 border-t border-gray-300">
            <p class="text-13">
            <span class="cf-footer-item sm:block sm:mb-1">Cloudflare Ray ID: <strong class="font-semibold">6e759ee1dd9c76a1</strong></span>
            <span class="cf-footer-separator sm:hidden">•</span>
            <span class="cf-footer-item sm:block sm:mb-1"><span>Your IP</span>: 2a02:a455:c03b:1:dc83:20cd:2a6b:cbaa</span>
            <span class="cf-footer-separator sm:hidden">•</span>
            <span class="cf-footer-item sm:block sm:mb-1"><span>Performance &amp; security by</span> <a href="https://www.cloudflare.com/5xx-error-landing" id="brand_link" rel="noopener noreferrer" target="_blank">Cloudflare</a></span>
            </p>
            </div><!-- /.error-footer -->
            </div><!-- /#cf-error-details -->
            </div><!-- /#cf-wrapper -->
            <script type="text/javascript">
              window._cf_translation = {};
            
            
            </script>
            </body></html>

我想進入該站點,但我似乎無法進入,任何想法或幫助表示贊賞,我添加了錯誤的屏幕截圖(由 selenium 制作)。 我可以在我的普通瀏覽器中訪問該站點,我想最好通過請求來抓取它,但 selenium 也可以。 謝謝在此處輸入圖像描述

我不是這方面的專家,但根據您提供的信息(尤其是屏幕截圖),您正在聯系的網站似乎受到保護,因此我猜您無法通過這種方式訪問它命令。 這意味着無法使用 Selenium 請求該站點的產品。

但是,正如我所說,我不是專家,所以如果我錯了請糾正我。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM