简体   繁体   English

使用 selenium 和 python 捕获 AJAX 响应

[英]Capture AJAX response with selenium and python

I click on a link in Firefox, the webpage sends a request using javascript, then the server sends some sort of response which includes a website address.我点击 Firefox 中的链接,网页使用 javascript 发送请求,然后服务器发送某种响应,其中包括网站地址。 So this new website then opens in a new Window.所以这个新网站会在一个新窗口中打开。 The html code behind the link is (I've omitted initial and final <span> tag):链接后面的 html 代码是(我省略了初始和最终<span>标记):

> class="taLnk hvrIE6"
> onclick="ta.trackEventOnPage('AttractionContactInfo', 'Website',
> 2316062, 1); ta.util.cookie.setPIDCookie(15190);
> ta.call('ta.util.link.targetBlank', event, this,
> {'aHref':'LqMWJQiMnYQQoqnQQxGEcQQoqnQQWJQzZYUWJQpEcYGII26XombQQoqnQQQQoqnqgoqnQQQQoqnQQQQoqnQQQQoqnqgoqnQQQQoqnQQuuuQQoqnQQQQoqnxioqnQQQQoqnQQJMsVCIpEVMSsVEtHJcSQQoqnQQQQoqnxioqnQQQQoqnQQniaQQoqnQQQQoqnqgoqnQQQQoqnQQWJQzhYmkXHJUokUHnmKTnJXB',
> 'isAsdf':true})">Website

I want to capture the server response and extract the 'new website' using Python and Selenium.我想捕获服务器响应并使用 Python 和 Selenium 提取“新网站”。 I've been using BeautifulSoup for scraping and am pretty new to Selenium.我一直在使用 BeautifulSoup 进行抓取,并且对 Selenium 还很陌生。

So far, I am able to find this element and click on it using selenium, which opens the 'new website' in a new window.到目前为止,我能够找到这个元素并使用 selenium 单击它,这会在新窗口中打开“新网站”。 I don't know how to capture the response from server.我不知道如何从服务器捕获响应。

I once intercepted some ajax calls injecting javascript to the page using selenium.我曾经拦截了一些使用 selenium 将 javascript 注入页面的 ajax 调用。 The bad side of the history is that selenium could sometimes be, let's say "fragile".历史的坏处是硒有时可能是“脆弱的”。 So for no reason I got selenium exceptions while doing this injection.因此,在进行此注射时,我无缘无故地遇到了硒异常。

Anyway, my idea was intercept the XHR calls, and set its response to a new dom element created by me that I could manipulate from selenium.无论如何,我的想法是拦截 XHR 调用,并将其响应设置为我创建的新 dom 元素,我可以从 selenium 操作该元素。 In the condition for the interception you can even use the url that made the request in order to just intercept the one that you actually want (self._url)在拦截的条件下,您甚至可以使用发出请求的 url 来拦截您真正想要的那个 (self._url)

btw, I got the idea from intercept all ajax calls?顺便说一句,我从拦截所有 ajax 调用中得到了这个想法

Maybe this helps.也许这有帮助。

browser.execute_script("""
(function(XHR) {
  "use strict";

  var element = document.createElement('div');
  element.id = "interceptedResponse";
  element.appendChild(document.createTextNode(""));
  document.body.appendChild(element);

  var open = XHR.prototype.open;
  var send = XHR.prototype.send;

  XHR.prototype.open = function(method, url, async, user, pass) {
    this._url = url; // want to track the url requested
    open.call(this, method, url, async, user, pass);
  };

  XHR.prototype.send = function(data) {
    var self = this;
    var oldOnReadyStateChange;
    var url = this._url;

    function onReadyStateChange() {
      if(self.status === 200 && self.readyState == 4 /* complete */) {
        document.getElementById("interceptedResponse").innerHTML +=
          '{"data":' + self.responseText + '}*****';
      }
      if(oldOnReadyStateChange) {
        oldOnReadyStateChange();
      }
    }

    if(this.addEventListener) {
      this.addEventListener("readystatechange", onReadyStateChange,
        false);
    } else {
      oldOnReadyStateChange = this.onreadystatechange;
      this.onreadystatechange = onReadyStateChange;
    }
    send.call(this, data);
  }
})(XMLHttpRequest);
""")

I've come up to this page when trying to catch XHR content based on AJAX requests.在尝试根据 AJAX 请求捕获 XHR 内容时,我来到了此页面。 And I eventually found this package我最终找到了这个

from seleniumwire import webdriver  # Import from seleniumwire
# Create a new instance of the Firefox driver
driver = webdriver.Firefox()

# Go to the Google home page
driver.get('https://www.google.com')

# Access requests via the `requests` attribute
for request in driver.requests:
    if request.response:
        print(
            request.url,
            request.response.status_code,
            request.response.headers['Content-Type']
        )

this package allow to get the content response from any request, such as json :这个包允许从任何请求中获取内容响应,例如 json :

https://www.google.com/ 200 text/html; charset=UTF-8
https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_120x44dp.png 200 image/png
https://consent.google.com/status?continue=https://www.google.com&pc=s&timestamp=1531511954&gl=GB 204 text/html; charset=utf-8
https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png 200 image/png
https://ssl.gstatic.com/gb/images/i2_2ec824b0.png 200 image/png
https://www.google.com/gen_204?s=webaft&t=aft&atyp=csi&ei=kgRJW7DBONKTlwTK77wQ&rt=wsrt.366,aft.58,prt.58 204 text/html; charset=UTF-8
..

I was unable to capture AJAX response with selenium but here is what works, although without selenium:我无法使用 selenium 捕获 AJAX 响应,但这是有效的,尽管没有 selenium:

1- Find out the XML request by monitoring the network analyzing tools in your browser 1- 通过监控浏览器中的网络分析工具找出 XML 请求

2= Once you've identified the request, regenerate it using Python's requests or urllib2 modules. 2= 确定请求后,使用 Python 的请求或 urllib2 模块重新生成它。 I personally recommend requests because of its additional features, most important to me was requests.Session.我个人推荐 requests 因为它的附加功能,对我来说最重要的是 requests.Session。

You can find plenty of help and relevant posts regarding these two steps.您可以找到有关这两个步骤的大量帮助和相关帖子。

Hope it will help someone someday.希望有一天它会帮助某人。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM