简体   繁体   English

硒:访问被拒绝

[英]Selenium: access denied

I am trying to scrape some data from LV website with Selenium and keep getting 'Access Denied' screen once 'sign in' button clicked.我正在尝试使用 Selenium 从 LV 网站抓取一些数据,并在单击“登录”按钮后继续显示“拒绝访问”屏幕。 I feel like there is a protection against this because all seems to be working fine when I do the same manually.我觉得有一种保护措施可以防止这种情况发生,因为当我手动执行相同操作时,一切似乎都运行良好。 Oddly, I need to click 'sign in' button twice to be able to sign in manually.奇怪的是,我需要单击“登录”按钮两次才能手动登录。

My code:我的代码:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'chromedriver.exe')
driver.get('https://secure.louisvuitton.com/eng-gb/mylv')
WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//span[@class='ucm-wrapper']")))
driver.find_element_by_xpath("//button[@class='ucm-button ucm-button--default ucm-choice__yes']").click()
driver.find_element_by_id('loginloginForm').send_keys('xxx@xxx.com')
driver.find_element_by_id ('passwordloginForm').send_keys('xxxxxx')
driver.find_element_by_id('loginSubmit_').click()

Error:错误:

You don't have permission to access "http://secure.louisvuitton.com/eng-gb/mylv;jsessionid=xxxxxxx.front61-prd?" on this server.

Is there a way to login with Selenium and bypass this?有没有办法使用 Selenium 登录并绕过它?

I took your code added a few tweaks and ran the test as follows:我把你的代码添加了一些调整并按如下方式运行测试:

  • Code Block:代码块:

     from selenium import webdriver driver.get('https://secure.louisvuitton.com/eng-gb/mylv') WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[text()='Accept and Continue']"))).click() WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@id='loginloginForm']"))).send_keys("Mudyla@stackoverflow.com") driver.find_element_by_xpath("//input[@id='passwordloginForm']").send_keys('Mudyla') driver.find_element_by_xpath("//input[@id='loginSubmit_']").click()

Observation观察

Similar to your observation, I have hit the same roadblock with no results as follows:与您的观察类似,我遇到了同样的障碍,但没有结果如下:

拒绝访问


Deep Dive深潜

It seems the click() on Sign In does happens.似乎登录上的click()确实发生了。 But while inspecting the DOM Tree of the webpage you will find that some of the <script> tag refers to JavaScripts having keyword akam .但是在检查网页DOM 树时,您会发现一些<script>标签引用了具有关键字akam 的JavaScript As an example:举个例子:

  • akam-sw.js install script version 1.3.3 "serviceWorker"in navigator&&"find"in[]&&function()
  • <script type="text/javascript" src="https://secure.louisvuitton.com/akam/11/7f0e2ae6" defer=""></script>
  • <noscript><img src="https://secure.louisvuitton.com/akam/11/pixel_7f0e2ae6?a=dD0xOWNjNTRjMmMxYzdmNmMwZjI0NTUwOGZmZDM5ZTQzMWQ5NjI5ZmIwJmpzPW9mZg==" style="visibility: hidden; position: absolute; left: -999px; top: -999px;" /></noscript>

Which is a clear indication that the website is protected by Bot Manager an advanced bot detection service provided by Akamai and the response gets blocked .这清楚地表明该网站受Bot Manager保护,这是Akamai提供的高级机器人检测服务,并且响应被阻止


Bot Manager机器人管理器

As per the article Bot Manager - Foundations :根据文章Bot Manager - Foundations

akamai_detection


Conclusion结论

So it can be concluded that the request for the data is detected as being performed by Selenium driven WebDriver instance and the response is blocked.因此可以得出结论,对数据的请求被检测为由Selenium驱动的WebDriver实例执行,并且响应被阻止。


References参考

A couple of documentations:几个文件:


tl; tl; dr博士

A couple of relevant discussions:几个相关的讨论:

It's been a while since I had posted this question but if anyone is interested below are the steps I've taken to solve the problem.我发布这个问题已经有一段时间了,但如果有人感兴趣,下面是我为解决问题而采取的步骤。

  1. Open chromedriver.exe in hex editor, find the string $cdc and replace with something else of the same length.在十六进制编辑器中打开chromedriver.exe ,找到字符串$cdc并替换为相同长度的其他内容。 Then save and run modified binary.然后保存并运行修改后的二进制文件。 Read more in this answer and the replies to it.在此答案及其回复中阅读更多内容。

  2. Selenium python code:硒蟒代码:

options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path='chromedriver.exe')
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                                                                     'AppleWebKit/537.36 (KHTML, like Gecko) '
                                                                     'Chrome/85.0.4183.102 Safari/537.36'})

对我来说,当我在启动驱动程序后添加以下行时它起作用了:

 driver.manage().deleteAllCookies();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM