简体   繁体   中英

How to load URL and track all requested resources?

I'm trying to identify pages which contain a specific tag, however this tag is contained within Google Tag Manager, and as a result isn't available in the page source, or the DOM (from what I can see).

I can however see within Chrome Dev Tools on the Network tab a request that relates to the tag in question.

I'm wondering if there is a way to load a page in python and keep track of all of the requests made when loading the page, so that I can then parse this list for the identifier I have.

Not sure if there is an obvious was of doing this, but I can't seem to find anything related in either the requests module or urllib3.

Edit - more info:

I am specifically trying to identify an AdWords Conversion tag. I know that this takes the form of a request to https://www.google.com/ads/conversion/xxxxxxxxxx/ . For most sites, the code is visible in the page source, or sometimes only in the DOM. I've used the requests module for the former, and phantomjs for the latter. However where the site is using Google Tag Manager, it doesn't appear to be in either.

I can however see this request being made in Chrome Dev Tools on the network tab, and so hopefully there is a way to replicate this directly within python?

After some pain trying to use OnResourceRequested in PhantomJS, I instead used the following:

    chromedriver = "/path/to/chromedriver"
    os.environ["webdriver.chrome.driver"] = chromedriver
    self.driver = webdriver.Chrome(chromedriver)

    self.driver.get(link)

    time.sleep(5)
    timings = self.driver.execute_script("return window.performance.getEntries();")
        for item in timings:
            print item[u'name']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM