简体   繁体   English

如何加载URL并跟踪所有请求的资源?

[英]How to load URL and track all requested resources?

I'm trying to identify pages which contain a specific tag, however this tag is contained within Google Tag Manager, and as a result isn't available in the page source, or the DOM (from what I can see). 我正在尝试识别包含特定标签的页面,但是此标签包含在Google跟踪代码管理器中,因此在页面源或DOM(从我所看到的)中不可用。

I can however see within Chrome Dev Tools on the Network tab a request that relates to the tag in question. 但是,我可以在Chrome开发工具的“网络”标签上看到与该标签相关的请求。

I'm wondering if there is a way to load a page in python and keep track of all of the requests made when loading the page, so that I can then parse this list for the identifier I have. 我想知道是否有一种方法可以在python中加载页面并跟踪加载页面时发出的所有请求,以便我可以解析此列表以获取我拥有的标识符。

Not sure if there is an obvious was of doing this, but I can't seem to find anything related in either the requests module or urllib3. 不知道这样做是否明显,但是我似乎在请求模块或urllib3中都找不到任何相关的内容。

Edit - more info: 编辑-更多信息:

I am specifically trying to identify an AdWords Conversion tag. 我专门尝试识别AdWords转化标签。 I know that this takes the form of a request to https://www.google.com/ads/conversion/xxxxxxxxxx/ . 我知道这是通过https://www.google.com/ads/conversion/xxxxxxxxxx/请求的形式。 For most sites, the code is visible in the page source, or sometimes only in the DOM. 对于大多数站点,代码在页面源中可见,有时仅在DOM中可见。 I've used the requests module for the former, and phantomjs for the latter. 我已经将请求模块用于前者,并将phantomjs用于后者。 However where the site is using Google Tag Manager, it doesn't appear to be in either. 但是,在使用Google跟踪代码管理器的网站上,该网站似乎都没有。

I can however see this request being made in Chrome Dev Tools on the network tab, and so hopefully there is a way to replicate this directly within python? 但是,我可以在网络标签上的Chrome开发工具中看到此请求,因此希望有一种方法可以直接在python中复制此请求?

After some pain trying to use OnResourceRequested in PhantomJS, I instead used the following: 在尝试在PhantomJS中使用OnResourceRequested感到痛苦之后,我改为使用以下内容:

    chromedriver = "/path/to/chromedriver"
    os.environ["webdriver.chrome.driver"] = chromedriver
    self.driver = webdriver.Chrome(chromedriver)

    self.driver.get(link)

    time.sleep(5)
    timings = self.driver.execute_script("return window.performance.getEntries();")
        for item in timings:
            print item[u'name']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM