简体   繁体   English

如何在硒中更快地从动态网站读取数据

[英]How to read data from dynamic website faster in selenium

I got a few dynamic websites (football live bets).我有一些动态网站(足球现场投注)。 There's no API I'm reading all of them in selenium.我没有在 selenium 中阅读所有 API。 I've got infinite loop and finding elements every time.我有无限循环,每次都在寻找元素。

while True:
    elements = self.driver.find_elements_by_xpath(games_path)
    for e in elements:
        match = Match()
        match.betting_opened = len(e.find_elements_by_class_name('no_betting_odds')) == 0

The problem is it's one hundred times slower than I need it to be.问题是它比我需要的慢一百倍。

What's the alternative to this?有什么替代方法? Any other library or how to speed it up with Selenium?任何其他库或如何使用 Selenium 加快速度?

One of websites I'm scraping https://www.betcris.pl/zaklady-live#/Soccer我正在抓取的网站之一https://www.betcris.pl/zaklady-live#/Soccer

The pice of code of yours has a while True loop without a break .你的代码有一个while True循环而不break That is an implemenation of an infinite loop.那是无限循环的实现。 From a short snipplet I can not tell if is this the root cause of your "infinite loop" issue, but may be so, check if you have any break statements inside your while loop.从一个简短的片段中,我无法判断这是否是您的“无限循环”问题的根本原因,但可能是这样,请检查您的while循环中是否有任何break语句。

As for the other part of your question: I am not sure how you measure performance of an infinite loop, but there is a way to speed up parsing pages with selenium: not using selenium.至于您问题的另一部分:我不确定您如何衡量无限循环的性能,但是有一种方法可以加快使用 selenium 解析页面的速度:不使用 selenium。 Grab a snapshot from the page and use that for evaluating states, values and stuff.从页面中获取快照并将其用于评估状态、值和内容。

import lxml.html

page_snapshot = lxml.html.document_fromstring(self.driver.page_source)
games = page_snapshot.xpath(games_path)

This approach is about 2 magnitudes faster than querying via selenium api.这种方法比通过 selenium api 查询快大约 2 个数量级。 Grab the page once, parse the hell out of it real quick and grab the page again later if you want to.抓取页面一次,快速解析它,如果需要,稍后再抓取页面。 If you want to just read stuff, you don't need webelements at all, just the tree of data.如果你只想阅读东西,你根本不需要 webelements,只需要数据树。 To interact with elements you'll need the webelement of course with selenium, but to get values and states, a snapshot may be sufficient.要与元素交互,您当然需要使用 selenium 的 webelement,但要获取值和状态,快照可能就足够了。

Or what you could do with selenium only: add the 'no_betting_odds' to the games_path xpath.或者你只能用硒做什么:将'no_betting_odds'添加到games_path xpath。 It seems to me that you want to grab those elements which do not have a 'no_betting_odds' class.在我看来,您想获取那些没有'no_betting_odds'类的元素。 Then just add the './/*[not contains(@class, "no_betting_odds")]' to the games_path (which you did not share so I can't update).然后只需将'.//*[not contains(@class, "no_betting_odds")]'games_path (您没有共享,所以我无法更新)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM