简体   繁体   English

如何使用Python从Selenium的重定向链中获取中间URL?

[英]How can I get a intermediate URL from a redirect chain from Selenium using Python?

I'm using Selenium with Python API and Firefox to do some automatic stuff, and here's my problem: 我正在将Selenium与Python API和Firefox结合使用来执行一些自动操作,这是我的问题:

  1. Click a link on original page, let's say on page a.com 单击原始页面上的链接,比方说a.com
  2. I'm redirected to b.com/some/path?arg=value 我被重定向到b.com/some/path?arg=value
  3. And immediately I'm redirected again to the final address c.com 然后我立即再次被重定向到最终地址c.com

So is there a way to get the intermediate redirect URL b.com/some/path?arg=value with Selenium Python API? 那么,有没有一种方法可以使用Selenium Python API获得中间重定向URL b.com/some/path?arg=value I tried driver.current_url but when the browser is on b.com , seems the browser is still under loading and the result returned only if the final address c.com is loaded. 我尝试了driver.current_url但是当浏览器在b.com上时 ,似乎浏览器仍在加载中,并且仅在加载了最终地址c.com时返回结果。

Another question is that is there a way to add some event handlers to Selenium for like URL-change? 另一个问题是,是否可以将某些事件处理程序添加到Selenium中以进行类似URL更改的方法? Phantomjs has the capacity but I'm not sure for Selenium. Phantomjs有能力,但是我不确定硒。

You can get redirects from performance logs. 您可以从performance日志中获取重定向。 According to docs and github answer here is what I've done in C#, should be possible to port in Python: 根据文档github答案,这是我在C#中所做的,应该可以在Python中移植:

var options = new ChromeOptions();
var cap = DesiredCapabilities.Chrome();
var perfLogPrefs = new ChromePerformanceLoggingPreferences();
perfLogPrefs.AddTracingCategories(new string[] { "devtools.network" });
options.PerformanceLoggingPreferences = perfLogPrefs;
options.AddAdditionalCapability(CapabilityType.EnableProfiling, true, true);
options.SetLoggingPreference("performance", LogLevel.All);
var driver = new ChromeDriver(options);
var url = "https://some-website-that-will-redirect.com/";
driver.Navigate().GoToUrl(url);
var logs = driver.Manage().Logs.GetLog("performance"); //all your logs with redirects will be here

Looping through logs , if message.params.redirectResponse.url is equal to original URL then message.params.request.url will contain redirect URL 循环浏览logs ,如果message.params.redirectResponse.url等于原始URL,则message.params.request.url将包含重定向URL

Answer my own question. 回答我自己的问题。

If the redirect chain is very long, consider to try the methods @alecxe and @Krishnan provided. 如果重定向链很长,请考虑尝试提供的@alecxe和@Krishnan方法。 But in this specific case, I've found a much easier workaround: 但是在这种特定情况下,我发现了一个更简单的解决方法:

When the page finally landed c.com, use driver.execute_script('return window.document.referrer') to get the intermediate URL 当页面最终登陆c.com时,请使用driver.execute_script('return window.document.referrer')获取中间URL

is there a way to get the intermediate redirect URL b.com/some/path?arg=value with Selenium Python API? 有没有办法使用Selenium Python API获取中间重定向URL b.com/some/path?arg=value?

I would use an Explicit Wait with a small poll interval. 我将使用轮询间隔较小的“ 显式等待” The idea would be to wait for the staleness of the body element on the initial page: 想法是等待初始页面上body元素陈旧

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

body = driver.find_element_by_tag_name("body")

wait = WebDriverWait(driver, 5, poll_frequency=0.05)
wait.until(EC.staleness_of(body))
print(driver.current_url)

You might also need to decrease the page load timeout : 您可能还需要减少页面加载超时

driver.set_page_load_timeout(0.5)

Another question is that is there a way to add some event handlers to Selenium for like URL-change? 另一个问题是,是否可以将某些事件处理程序添加到Selenium中以进行类似URL更改的方法?

This is exactly what these Explicit Waits are about. 这正是这些“显式等待”的含义。 There are relevant title_is , title_contains expected conditions and it's easy to write your custom one (for example, to wait for some substring in the current URL). 有相关的title_istitle_contains预期的条件,并且很容易编写自定义条件 (例如,等待当前URL中的某些子字符串)。

Proxy Servers such as BrowserMob proxy can be setup into your Selenium test and then have your web traffic routed via the the Proxy server. 可以将诸如BrowserMob代理之类的代理服务器设置到Selenium测试中,然后通过代理服务器路由您的网络流量。 The traffic information is all captured as HAR files.You can try getting this information by plugging in a proxy server such as BrowserMob Proxy 交通信息全部捕获为HAR文件。您可以尝试通过插入代理服务器(例如BrowserMob Proxy)来获取此信息

AFAIK The only listening hook in mechanism that Selenium provides is the EventFiringWebDriver wherein you can plugin your own event listening by extending AbstractWebDriverEventListener via the register method in EventFiringWebDriver. AFAIK Selenium提供的唯一监听机制是EventFiringWebDriver ,您可以通过EventFiringWebDriver中的register方法扩展AbstractWebDriverEventListener来插入自己的事件监听。 But the EventFiringWebDriver has limitations. 但是EventFiringWebDriver有局限性。 It cannot eavesdrop into events that arise out of Actions class. 它不能窃听Actions类产生的事件。 There's an alternative to that as well. 还有一种替代方法。 Sometime back I created a blog post that talks about it. 有时我创建了一篇博客文章来谈论它。 Maybe you can refer that as well. 也许您也可以参考。 Here's the link 这是链接

I don't know if there is similar to this in Python (since I have never worked with the Selenium Python bindings ) 我不知道Python中是否有与此类似的东西(因为我从未使用过Selenium Python绑定)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 Python 中的 selenium 从 Chrome 获取当前的 URL - How to get current URL from Chrome using selenium in Python 如何使用 selenium 和 python 从网站获取工具提示文本,其中文本来自 javascript - How can I get the tooltip text from a website using selenium and python where the text comes from a javascript 我如何在Python中从selenium.webdriver获取HTML? - How can i get html from selenium.webdriver in Python? 将硒与python结合使用,如何从JS中声明的HTML中获取Var <script> element - Using selenium with python, how can I get Var from HTML where it's declared in a JS <script> element 如何从短网址或重定向网址中获取真实(最终)网址? (用于使用 python 进行抓取) - How can I get the real(final) URL from shorthen or redirected url? (for scraping using python) 如何使用Selenium-Python从对话中获取文本? - How can I get the text from a dialogue using selenium-python? 如何使用 chromedriver Selenium 和 Python 从临时警报中获取文本? - How can I get the text from an temporary alert using chromedriver Selenium and Python? 如何使用 Python 中的 Selenium 从这个跨度 class 获取文本? - How can I get text from this span class using Selenium in Python? 当使用 Python 和 Selenium 抓取 web 时,如何从单个页面获取所有 href 链接? - How can I get all the href links from a single page when web scraping using Python and Selenium? 如何使用Selenium和Python从div中收集这些数据 - How can I collect this data from a div using Selenium and Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM