简体   繁体   English

使用Selenium 2和Browsermob在新页面之前捕获Click har

[英]Capturing click har before new page with Selenium 2 and Browsermob

I have this automation tool I've built with Selenium 2 and Browsermob proxy that works quite well for most of what I need. 我拥有使用Selenium 2和Browsermob代理构建的自动化工具,可以很好地满足我的大部分需求。 However, I've run into a snag on capturing network traffic. 但是,我在捕获网络流量方面遇到了麻烦。

I basically want to capture the har that a click provides before the page redirects. 我基本上想捕获页面重定向之前单击提供的har。 For example, I have an analytics call happening on the click that I want to capture, then another analytics call on the page load that I don't want to capture. 例如,我有一个分析调用发生在要捕获的点击上,然后又有一个我不想捕获的页面加载发生在分析调用上。

All of my attempts currently capture the har too late, so I see both the click analytics call and the page load one. 目前,我所有的尝试都无法及时捕获har,因此我看到了点击分析调用和页面加载。 Is there any way to get this working? 有什么办法可以使它正常工作吗? I've included my current relevant code sections below 我在下面包括了我当前的相关代码部分

METHODS INSIDE HELPER CLASS 帮助类中的方法
 class _check_for_page_load(object): def __init__(self, browser, parent): self.browser = browser self.maxWait = 5 self.parent = parent def __enter__(self): self.old_page = self.browser.find_element_by_tag_name('html') def wait_for(self,condition_function): start_time = time.time() while time.time() < start_time + self.maxWait: if condition_function(): return True else: time.sleep(0.01) raise Exception( 'Timeout waiting for {}'.format(condition_function.__name__) ) def page_has_loaded(self): new_page = self.browser.find_element_by_tag_name('html') ###self.parent.log("testing ---- " + str(new_page.id) + " " + str(self.old_page.id)) return new_page.id != self.old_page.id def __exit__(self, *_): try: self.wait_for(self.page_has_loaded) except: pass def startNetworkCalls(self): if self._p != None: self._p.new_har("Step"+str(self._currStep)) def getNetworkCalls(self, waitForTrafficToStop = True): if self._p != None: if waitForTrafficToStop: self._p.wait_for_traffic_to_stop(5000, 30*1000); return self._p.har else: return "{}" def click(self, selector): ''' clicks on an element ''' self.log("Clicking element '" + selector + "'") el = self.findEl(selector) traffic = "" with self._check_for_page_load(self._d, self): try: self._curr_window = self._d.window_handles[0] el.click() except: actions = ActionChains(self._d); actions.move_to_element(el).click().perform() traffic = self.getNetworkCalls(False) try: popup = self._d.switch_to.alert if popup != None: popup.dismiss() except: pass try: window_after = self._d.window_handles[1] if window_after != self._curr_window: self._d.close() self._d.switch_to_window(self._curr_window) except: pass return traffic 
INSIDE FILE THAT RUNS MULTIPLE SELENIUM ACTIONS 运行多个硒动作的文件内部
 ##inside a for loop, we get an action that looks like "click('#selector')" util.startNetworkCalls() if action.startswith("click"): temp_traffic = eval(action) if temp_traffic == "": temp_traffic = util.getNetworkCalls() traffic = json.dumps(temp_traffic, sort_keys=True) ##gives json har info that is saved later 

You can see from these couple snippets that I initiate the "click" function which returns network traffic. 从这些片段中可以看到,我启动了“点击”功能,该功能返回网络流量。 Inside the click function, you can see it references the class "_check_for_page_load". 在click函数内部,您可以看到它引用了“ _check_for_page_load”类。 However, the first time it reaches this line: 但是,它第一次到达此行:

 ###self.parent.log("testing ---- " + str(new_page.id) + " " + str(self.old_page.id)) 

The log (when enabled) shows that the element ids don't match on the first time it logs, indicating the page load has already started to happen. 日志(启用时)显示元素ID在首次记录时不匹配,表明页面加载已开始发生。 I'm pretty stuck right now as I've tried everything I can think of to try to accomplish this functionality. 我已经很努力地尝试着完成该功能,想尽一切办法,对此感到非常困惑。

I found a solution to my own question - though it isn't perfect. 我找到了自己的问题的解决方案-尽管这并不完美。 I told my network calls to capture headers: 我告诉我的网络电话捕获标头:

def startNetworkCalls(self):
    if self._p != None:
        self._p.new_har("Step"+str(self._currStep),{"captureHeaders": "true"})

Then, when I retrieve the har data, I can look for the "Referer" header and compare that with the page that was initially loaded (before the redirect from the click). 然后,当我检索har数据时,我可以查找“ Referer”标头,并将其与最初加载的页面(在点击重定向之前)进行比较。 From there, I can split the har into two separate lists of network calls to further process later. 从那里,我可以将har分成两个单独的网络呼叫列表,以在以后进行进一步处理。

This works for my needs, but it isn't perfect. 这可以满足我的需求,但并不完美。 Some things, like image requests, sometimes get the same referrer that the previous page's url matched, so the splitting puts those into the first bucket rather than the appropriate second bucket. 有些东西(例如图像请求)有时会获得与上一页url匹配的相同引荐来源网址,因此拆分会将它们放入第一个存储桶,而不是适当的第二个存储桶。 However, since I'm more interested in requests that aren't on the same domain, this isn't really an issue. 但是,由于我对不在同一域中的请求更感兴趣,所以这实际上不是问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM